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Analytic continuation of numerical data obtained in imaginary time or frequency has become an 
essential part of many branches of quantum computational physics. It is, however, an ill-conditioned 
procedure and thus a hard numerical problem. The maximum-entropy approach, based on bayesian 
inference, is the most widely used method to tackle that problem. Although the approach is well 
established and among the most reliable and efficient ones, useful developments of the method and 
of its implementation are still possible. In addition, while a few free software implementations 
are available, a well-documented, optimized, general purpose and user-friendly software dedicated 
to that specific task is still lacking. Here we analyze all aspects of the implementation that are 
critical for accuracy and speed, and present a highly optimized approach to maximum-entropy. 

Original algorithmic and conceptual contributions include (1) numerical approximations that yield a 
computational complexity that is almost independent of temperature and spectrum shape (including 
sharp Drude peaks in broad background for example) while ensuring quantitative accuracy of the 
result whenever precision of the data is sufficient, (2) a robust method of choosing the entropy 
weight a that follows from a simple consistency condition of the approach and the observation that 
information- and noise-fitting regimes can be identified clearly from the behavior of y 2 with respect 
to a, and (3) several diagnostics to assess the reliability of the result. Benchmarks with test spectral 
functions of different complexity and an example with an actual physical simulation are presented. 

Our implementation, which covers most typical cases for fermions, bosons and response functions, 
is available as an open source, user friendly software. 


I. INTRODUCTION 

Most calculations in quantum statistical mechanics, as 
applied in condensed matter physics, nuclear physics, 
and particle physics, rely on the Matsubara approach. 
The procedure is in principle well defined: Compute 
the appropriate correlation function in imaginary time 
or frequency with a given method, then perform the an¬ 
alytic continuation to obtain physically meaningful re¬ 
sults. However, when the calculations are numerical, the 
latter step is not straightforward, and is in fact an ill- 
conditioned problem for most worthwhile cases. This 
means that very high precision data is needed to obtain 
a reasonable precision in the final result. Therefore, al¬ 
though very simple qualitative information, such as the 
insulating or conducting character, can be deduced from 
imaginary time or frequency representation, obtaining 
quantitative information contained in spectral functions 
is a difficult task. 

For very accurate data, fitting an analytic interpolat¬ 
ing function and substituting the Matsubara frequency 
with a real frequency can yield good results. Rational 
polynomials, also called Pade approximants, are usu¬ 
ally the best choice for that procedure. [1-3] However, 
for noisy data, obtained from quantum Monte Carlo for 
example, Pade approximants often fail to give sensible 
results. Even for non-stochastic results, finite precision 
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round-off errors may yield unphysical results. [ ] Other di¬ 
rect fitting schemes based on known analytical properties 
of the result can also be used. [5] 

To extract as much information as possible from noisy 
Matsubara data, one must restrict the space of possi¬ 
ble spectral functions by using what is known a priori 
about the exact result. This type of approach, called 
Bayesian inference, has proven very successful for the 
analytic continuation problem. [6-9] In that context, in¬ 
stead of an interpolating scheme like the Pade approxi¬ 
mants approach, the maximum entropy (MaxEnt) proce¬ 
dure amounts to a smoothing of the data involving the 
minimization of a “free energy” Q = —aS, where \ 2 
is the quadratic distance between the data and the fit, S 
is an entropy relative to a default model, and a is an ad¬ 
justable parameter. [ ] The choice of a is critical since it 
controls the distance between the fit and the data, which 
has a very large effect on the resulting spectrum because 
of the bad conditioning of the problem. A few different 
approaches to determine a have been used so far. Al¬ 
though some are more reliable, it is not clear which one 
is the best approach. 

Other approaches use stochastic sampling over likely 
spectra for analytic continuation. [3, 10-14] The most 
widely used stochastic analytic continuation approach 
(SAC), computes the average spectrum according to the 
distribution exp(—x 2 /B)-[3, 10, 12—1 ] Therefore, % 2 in 
this approach is treated as the energy of a fictitious phys¬ 
ical system and O as the temperature. SAC can also 
be formulated using the bayesian principle, [1 ] but it 
does not, in principle, require a default model, or a grid 
choice [11], although one can be introduced to yield the 



2 


same limits as MaxEnt at large and small a. [12] However, 
it has been pointed out recently that the choice of a grid 
can be equivalent to the choice of a default model. [16] [ 7] 
As in the MaxEnt approach, there is no unique way of 
choosing the temperature 0 in SAC. Moreover, it typ¬ 
ically takes several hours to compute a single spectrum 
with SAC, hundreds of times longer than for MaxEnt. [14] 
It is also not clear if it systematically produces improved 
results over MaxEnt. Indeed, for reasonably precise data, 
both approaches should yield the same result. Consider¬ 
ing how much faster MaxEnt is, it is clearly the best 
method for these cases. For moderate or low precision 
data, only qualitative results can be obtained, whatever 
the approach used. The main question about SAC, which 
remains to be answered, is therefore whether it is bet¬ 
ter than MaxEnt to extract all the available information 
from the data in those cases. Note that recent improve¬ 
ments related to an optimal choice of bounds [15] should 
be considered in MaxEnt as well. 

MaxEnt thus remains the most practical approach 
available so far for analytical continuation, and although 
it has been used for more than two decades in statis¬ 
tical physics, we show here that relevant new develop¬ 
ments can still be made for the approach itself and its 
implementation. In particular, we introduce (1) tech¬ 
niques that minimize errors due to numerical approxi¬ 
mations and make them negligible compared to typical 
noise magnitudes, and at the same time yield a compu¬ 
tational complexity almost independent of temperature 
and spectrum complexity, which therefore ensure quanti¬ 
tative accuracy of the result, whenever data precision is 
sufficient; (2) a different approach to choose the optimal 
a that produces optimal results under some reasonable 
conditions, and that follows merely from internal consis¬ 
tency of the MaxEnt approach and from the observation 
that information- and noise-fitting regimes can be iden¬ 
tified clearly when \ 2 is plotted as a function of a on a 
log-log scale. An analogous idea [ .2], or variations [10], 
have been discussed before in the context of SAC and 
analogies to phase transitions, but here we justify why 
this procedure works in the context of MaxEnt without 
invoking phase transitions and we also show how to ver¬ 
ify its consistency; (3) graphical diagnostic tools to assess 
consistency of the above choice of a, the quality of the 
fit and the reliability of the resulting spectrum. 

To summarize, we present specific algorithms to op¬ 
timize all aspects of the maximum entropy approach, 
with the aim of extracting as much information as possi¬ 
ble from the data, while also keeping computation time 
low. Those algorithms are implemented in a user friendly 
software, freely available under the GNU general public 
license (GPL). The software can take as input data a 
fermionic or a bosonic Green function, correlation func¬ 
tion, or self-energy, given as a function of Matsubara fre¬ 
quency or of imaginary time, with diagonal or general 
covariance matrices. It treats normal Green functions, 
namely those for which the spectral weight A is posi¬ 
tive, namely A(u>) > 0 for fermions, while for bosons, 


A(u})/oj > 0. [ ] The code [19] was written in C++ 
using the Armadillo C++ linear algebra library. [20] 

The paper is organized as follows. A short review of 
Bayesian inference and the basis of maximum entropy is 
presented in Sec. II, then our algorithms are summa¬ 
rized in Sec. Ill, which contains in particular subsections 

III A on the kernel matrix definition, IIIB on the choice 
of the coefficient a of the entropy and III C on what we 
call “diagnostic tools” to check the level of confidence in 
the results. The step-by-step procedure is described in 
Sec. IV, which is divided into subsections IV A on the 
extraction of information directly from Matsubara data, 

IV B on the case of imaginary-time data, IV C on the grid 
and default model definition, IV D on the kernel, IV E on 
solving the minimization problem, and IV F on choosing 
the optimal a. Section V presents benchmarks, an ap¬ 
plication to a real physical problem, and the diagnostic 
tools. Finally, the discussion and conclusion are in sec¬ 
tions VI and VII, respectively. Mathematical details on 
all aspects of the various algorithms are provided in nine 
appendices. 

II. BAYESIAN INFERENCE AND MAXIMUM 
ENTROPY FOR ANALYTIC CONTINUATION 

Before addressing the algorithms, let us review the 
basic equations of the maximum entropy (MaxEnt) ap¬ 
proach. More details and discussion about those equa¬ 
tions can be found in the classic review of Jarrell and 
Gubernatis, Ref. 8. Appendix A presents an indepen¬ 
dent heuristic derivation that suggests the connection 
with stochastic analytic continuation. 

The main equation of the analytic continuation prob¬ 
lem for the Matsubara Green function is 

G{iui n ) = J du>K(iu> n ,uj)A(u>) (1) 

where G(iuj n ) is the input Green function (or correla¬ 
tion function) known numerically for a certain number 
of Matsubara frequencies iui n , and A(u>) is the spectral 
function to be determined. The function K(iu n ,ui), to be 
defined for a specific problem, is usually called the kernel. 
We assume here that A(ui) is a positive function. This 
covers the cases of “normal” Green functions, for which 
the true spectral function A*(w) is positive for fermions, 
while the condition A*(w)/w > 0 is satisfied for bosons. 
The Green function may also be known as a function of 
imaginary time, in which case the Fourier transform of 
Eq. (1) may also be used: 

G(t) = J du)K(T,u>)A(uj). ( 2 ) 

The kernels in those expressions are typically very 
sharp functions with large ratios between largest and 
smallest values. This makes it impossible to obtain a 
reasonable accuracy by numerically solving those equa¬ 
tions in a standard way, namely by simply discretizing ui 
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with as many points as there are values of r (or of iui n ) 
and solving the resulting linear system. That system 
would be too ill-conditioned. Nevertheless, the problem 
is not hopeless. Although it remains very difficult, using 
Bayesian inference makes it tractable. 

Consider a Green function G* related to a spectrum A* 
through (1) and another G, which is an approximation 
to G* known with accuracy a. Because of the bad condi¬ 
tioning of the system G = K A obtained from discretizing 
(1), the space of possible spectra A that can produce a 
function G' within G ± a contains very different spectra. 
The main idea behind Bayesian inference is to use known 
information about the true spectrum A* other than the 
constraint G — a < K A < G + o to restrict that space 
to only the most probable ones according to that prior 
information. The starting point to do that is Bayes’ rule, 


P(A\G) = 


P(G\A)P(A) 

P(G) 


( 3 ) 


P(A|G), the probability that, given the data G, we obtain 
the spectral weight A, is called the posterior probability ; 
P(G|A), the probability that, given A, we obtain the data 
G, is the likelihood ; and P(A) is the prior probability for 
A. For the present case, P(G ) is a constant that can 
be ignored since the data are fixed. The most probable 
A , given the data G, is thus obtained by maximizing 
P{G\A)P{A). If the data are generated using a stochastic 
method, the likelihood is obtained from the central limit 
theorem. For an element of G, we have 


P(Gi\Gi) oc e" 




( 4 ) 


where Gi is the expected value of G,; and of is the vari¬ 
ance. For all the elements of G, we thus have 


P(G|G) oc e“T 


( 5 ) 


where 


* 2 = £ 


(Gj - GiY 


( 6 ) 


If the covariance is not diagonal, one must instead use 


1. G = K A is a good approximation to the exact 
Green function. It is therefore a smooth function. 

2. The elements of AGj/ = Ul(G-G) =U t (G—KA), 
where U contains the eigenvectors of the covariance 
matrix C, are uncorrelated random variables. 

We come back to those assumptions at end of the present 
section and in section IIIB. 

The prior probability P(A) is more difficult to define. 
Its main property should be to favour the prior infor¬ 
mation we have about the spectrum, without introduc¬ 
ing any correlation not present in that information or in 
the data. [21, 22] In the present case, we want to enforce 
positivity and maybe a few global properties such as nor¬ 
malization and the first two moments of A. A form that 
satisfies those requirements is 

P(A) = e ^ r , ( 10 ) 

where 

Z s a = J VAe aS (11) 


and 


f dui ... , A(ui) 

S = -J 2i AMla YY) (12) 

is the relative entropy (also known as the Kullback- 
Leibler divergence), which becomes 


S 


£ 


Ac jj 

27r 


A(wi) In 


Ajuij) 

D{ui) 


(13) 


after discretizing w.[23] D(u>), called the default model , 
is the solution of the optimization problem without any 
data (for that form, the solution is actually e -1 D(u;)), 
and should thus contain merely the prior information 
about the spectrum. 

With this definition, the posterior probability becomes 
P(A|G) oc e aS ~^ . (14) 


x 2 = (G-G) T C- 1 (G-G) , (7) 

here in matrix form, where C is the covariance matrix. 
Now, given a spectrum A, G = K A, where K is the 
kernel matrix, and thus 

P(G|A)oce“£ (8) 

with 

X 2 = (G-KA) T C“ 1 (G-KA) . (9) 

Now, the basic assumptions when using (8) as the like¬ 
lihood are: 


This quantity must be maximized to obtain the spectrum 
A, hence the name maximum entropy. 

We still need to fix the value of a. There is however 
no unique prescription to do that. Three different ap¬ 
proaches have mainly been used to this day. Those are 
the historic aproach, the classic one, [24, 25] and Bryan’s 
approach. [26] 

In the histoidc approach, a is chosen such that x 2 = 
N , where N is the number of elements in G. This is 
indeed the expected value of y 2 if the elements of G are 
randomly distributed according to (8). However, as such, 
N is only an average and not the value to expect for a 
single sample G. In addition, the covariance matrix is 
not known exactly in practice, its elements being also 
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random variables, which adds uncertainty to the value of 
X 2 ■ The criteria y 2 = N at the optimal spectrum should 
therefore be a good choice only if N is large and the 
covariance is known accurately. Indeed, this approach is 
very sensitive to bad estimates of the error on the data. 
For example, if it is overestimated by a factor of 10, y 2 
calculated for the optimal spectrum, obtained with the 
correct error, will now be reduced by a factor of 100. The 
approach will also fail if the error is not well estimated 
for some of the data points, for example if some errors 
are overestimated in a frequency range with respect to 
another. In conclusion, the historic approach works only 
in ideal cases. 

In the classic approach, Bayes inference is used again 
to find the most probable a. Then Eq. (14) is maximized 
with a set to that value. [8, 24, 25, 27] In addition to the 
above, this involves the definition of a prior probability 
for a, P{a). According to Bayes rule, we have 


P(A, a\G) 


P(G\A, a)P(A, a) 

P(G) 

P(G\A,a)P(A\a)P(a) 

PW) 

P(G\A)P(A\a)P(a) 
P(G) ’ 


which, according to (8) and (10), becomes 
P{A,a\G) oc ^e aS ~^P(a). 

^ OL 


(15) 


(16) 


Now, by functionally integrating that expression with re¬ 
spect to all possible spectra A, one obtains the distribu¬ 
tion 


P(a\G) k^JvA e aS ~^ , (17) 


The value of a is then taken as the most probable one. 
To find the maximum, one needs a guess for the prior 
P(a). The most commonly used is 1/a. 

The classic approach assumes that P(a\G ) is a nar¬ 
row distribution centered around the maximum. When 
this assumption is not valid, the above reasoning leads 
to Bryan’s approach, [26] where the spectrum is given by 
the average spectrum 


A = 


da A a P(a\G ), 


(18) 


Our approach to choose a consists in finding the spec¬ 
trum that satisfies assumptions 1 and 2 given above. 
These assumptions are implicit when we use (8) as the 
likelihood, with y 2 given by (9). Since assumption 1 sets 
a lower bound on a and assumption 2 an upper bound, 
they define the region where the optimal a is located. 
Our criterion to choose a is equivalent to assuming that, 
at the optimal a, the spectrum contains as much of the 
information present in the data as possible, without con¬ 
taining its noise. If one can identify a range of a where 
essentially only information is fitted and a range where 
no more information, but only noise is fitted, then we can 
define the range where the optimal a is located, namely 
in the crossover between the two regions. Under reason¬ 
able conditions, which are discussed below, those ranges 
of a do exist. The default model is not required to be 
close to the spectrum in this approach. The procedure to 
locate the crossover region and choose precisely the value 
of a within it is described in more details in section IIIB. 


III. ALGORITHMS 

The following subsections describe the most important 
aspects of our maximum entropy implementation. More 
technical details are given in the appendices. 


A. The kernel matrix 


In the present discussion, we assume the thermody¬ 
namic limit, namely, that the spectrum A{u>) is a contin¬ 
uous function. 

The kernel matrix K in y 2 , Eq. (9), is obtained from 
a numerical approximation to 


or 


where 


G(iuj n ) 



dui A(ui ) 

27T ico n — LO 


G(r) 


du e~ WT A(uj) 
2tt 1 ± e ~’ 




(2 n + 1)7 tT , fermions , 

2mrT , bosons, 


(19) 


( 20 ) 


( 21 ) 


where A a is the spectrum at the value a. 

To perform the functional integral in (17), it is nec¬ 
essary to use a Gaussian approximation for the poste¬ 
rior probability P(A, a|G).[24, 25] This approximation is 
valid when the default model and the spectrum are close 
to each other and thus a remains large. [28] When this 
condition is not satisfied, P{a\G) in (17) is peaked at a 
value of a so small that it leads to overfitting. Therefore, 
in the classic and Bryan’s approaches, the default model 
must be chosen carefully. [8] 


n is an integer, T is the temperature in the same units 
as the real frequencies, /3 = 1/T, and — /? < r < /? is the 
imaginary time. Now, the form to use seems at first to 
depend simply on which type of data is available, G{iui n ) 
or G(t). However, assuming that G{iw n ) and G(r) are 
equally accessible, the choice between (19) and (20) ac¬ 
tually has important consequences numerically. Indeed, 
if we assume a piecewise polynomial approximation for 
A(w) between the discrete points Uj where the (trial) 
spectrum A is defined, then integrating analytically the 
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form (19) in each interval yields relatively simple and 
easy to evaluate expressions (see Appendix E). In that 
case, the kernel 1 /(iw„—w) is treated exactly, and the ac¬ 
curacy of the result only depends on how well the “true” 
spectrum A{uj) is approximated by the piecewise polyno¬ 
mial. More specifically, it depends on the interpolation 
scheme — linear, quadratic, cubic spline, etc — and on 
how well the grid resolves the structures in A(w). On 
the other hand, integrating (20) with the piecewise poly¬ 
nomial approximation for A(ui) produces quite cumber¬ 
some hypergeometric functions. The other possibility is 
to use a full numerical integration method, but then the 
grid must be adapted to both the spectrum A{ui) and to 
the kernel e - “ T /(l ± e~^ u ) to obtain a good accuracy 
for the integral. Since the kernel becomes increasingly 
sharp around ui = 0 as temperature decreases, numerical 
integration then becomes harder, regardless of the com¬ 
plexity of A{w). Therefore, if both G(iuj n ) or G(r) are 
accessible, it is much more convenient to use (19), in com¬ 
bination with piecewise analytical integration, instead of 
(20) to compute K. Now, as described in Appendix C, 
if G(r) is the input data, G{iu n ) can be computed accu¬ 
rately, using cubic splines, if the first two moments of the 
spectrum are known. These moments can be obtained by 
two different methods, i.e., by a commutator calculation, 
or by a polynomial fit to G(r) if the step in r is small 
enough. Hence, G{iuj n ) is relatively easy to obtain. 

The accuracy of the approximation 

G{ico n ) » K(iu n )A (22) 

to (19), written here in matrix notation with the line label 
explicit, depends on the choice of interpolation method, 
and depending on the noise level in the input data, this 
choice can be more or less critical. In general, the er¬ 
ror introduced by using the approximation (22) must be 
small compared to the noise in the input data Gi n (iio n ), 
so that the error on the spectrum A is a consequence of 
the error on the data, and not of the error resulting from 
numerical approximations. Indeed, although the use of a 
maximum entropy approach renders the inversion of the 
form (19) tractable, the spectrum A is still very sensitive 
to errors in G, and the integration error is equivalent 
to adding a systematic error to the data, which can dis¬ 
tort the resulting spectrum. Gaussian quadratures are 
usually the best choice for numerical integration. How¬ 
ever, they cannot be used in the present case because we 
want to treat exactly the kernel l/(iui n — w). We would 
thus use it as the weight function and would obtain grid 
points cCj and corresponding weights Wi that depend on 
w n . However, the result we are looking for is a vector 
A defined on a fixed grid. In that case, the best inter¬ 
polation method for a smooth function is a cubic spline, 
which is what we use. 

Another assumption on the spectrum A(w) that is im¬ 
plicit is that it is bounded, namely, it decreases rapidly 
at high |w|. For this type of behavior, polynomials in 1/w 
are much better approximations than polynomials in w. 
For that reason, below a certain frequency w; and above 


oj r , we use splines that are cubic in u = l/(w— w M ), where 
ujfj, takes different values on the left and right sides and 
is determined by the cutoff frequencies. The spline we 
use to model the spectrum is thus a “hybrid” spline. In 
addition, instead of using a dense grid in u> in the high 
frequency regions, we use a grid density that is uniform 
in it, and thus highly non-uniform in uj. For coi < to < u> r , 
the grid density can be adapted to spectral features, but 
is uniform in ui by default. The grid and the spline are 
thus perfectly adapted to each other, and the combina¬ 
tion of hybrid spline and non-uniform grid still yields 
highly accurate results for the integral (19) and also when 
computing moments of the spectrum (the use of mo¬ 
ments is discussed below), although the frequency step 
Ac dj = u>i+i — uJi increases rapidly in the high frequency 
regions, leading to a very sparse grid in those regions. 
Indeed, for example, when comparing the moments of a 
given spectrum, for which the moments are known ex¬ 
actly, computed with an ordinary spline and with the 
hybrid spline with the same non-uniform grid, the mo¬ 
ments obtained with the ordinary spline have very low 
precision, especially high order ones, while the moments 
obtained with the hybrid spline are closer to the exact 
results by a few orders of magnitude. As a result, the 
combination hybrid spline and non-uniform grid greatly 
reduces the number of frequency grid points, which re¬ 
duces computation time, while keeping high accuracy. 
Finally, the computation time can further be reduced by 
using a grid well adapted to the spectrum in the low fre¬ 
quency region (see section IV C and Appendix H for more 
details). 

Since the spectrum we are looking for is a function of 
frequency, it is also more convenient to work with G(ioj n ), 
instead of G(r). This allows for a straightforward opti¬ 
mization of the Matsubara frequency grid when the num¬ 
ber of Matsubara frequencies becomes large, (a) First, 
the part of the data that behaves as the asymptotic ex¬ 
pansion of G (see Eq. (B3)), say for oj n > ui as , typi¬ 
cally contains only information about a few moments of 
the spectrum. Therefore, the terms corresponding to fre¬ 
quencies larger than uj as in y 2 can be replaced with a few 
terms constraining moments. Namely, all the terms of the 
form (G in (iw n ) — K(iu> n )A) / <j(u> n ) with u n > oj as are re¬ 
placed with a few terms of the form (Mj — rrijA) 2 /(Jm-, 
where Mj is the j th moment of the true spectrum, cjm, 
its standard deviations, and rrij is a line vector such that 
irijA computes the j th moment of the trial spectrum A. 
In practice, the frequencies ui n > uj as in the vector G 
are replaced with the Mj’s and the lines corresponding 
to those frequencies in the matrix K, replaced with the 
line vectors rrij’s. This substitution reduces the num¬ 
ber of terms in y 2 essentially without losing informa¬ 
tion contained in the data, (b) The second optimization 
of the Matsubara frequency grid follows, first, from the 
fact that the low frequencies of A(uj) are related to the 
low Matsubara frequencies of G(iui n ), and similarly for 
other frequency ranges, although not in a local manner, 
and second, from the fact that, in systems of interacting 
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particles, quasiparticle lifetime decreases with frequency, 
and thus the structures in A(w) are typically broader as 
w increases. Consequently, most of the time, not all the 
terms corresponding to Matsubara frequencies below w as 
need to be kept in % 2 , and the frequency grid can actu¬ 
ally be made more sparse as w n increases, again while 
keeping the most relevant information contained in the 
data. The non-uniform Matsubara frequency grid we use 
is described in Appendix I. Those two optimizations of 
the Matsuabra frequency grid greatly reduce computa¬ 
tion time without noticeable effect on the result. In fact, 
they modify the scaling with temperature of the num¬ 
ber of terms to be included in y 2 , and once they are 
combined with the piecewise analytical integration of the 
form (19), and the adapted non-uniform real frequency 
grid, the computational complexity becomes only weakly 
dependent on spectrum complexity and temperature. 


Note that the spectral moments are mentioned in Ref. 
[8] , but only as prior information useful to define the de¬ 
fault model. Although, this yields a good default model, 
if they are not included in x 2 as well, their importance 
in the problem then becomes smaller as a decreases. We 
also use the first and second moments to define a good de¬ 
fault model, but because we use them in % 2 as well, they 
are treated in the same manner as the rest of the data, 
therefore as strong constraints on the spectrum. The 
moments have also been used as constraints in stochastic 
analytic continuation. [10, 12, 14, 29] 


To summarize, we use the representation (19) with a 
piecewise polynomial approximation for A(w) which al¬ 
lows easy evaluation of the integral by piecewise ana¬ 
lytical integration. This operation solves the problem 
of difficult numerical integration at small w n that wors¬ 
ens as temperature decreases. In addition, we use a hy¬ 
brid spline with a matching non-uniform real-frequency 
grid adapted to spectrum structure and a truncated non- 
uniform Matsubara frequency grid with constraints on 
moments included as terms in y 2 . The combination of 
those techniques yields in the end a computation time 
that is only weakly dependent on temperature and spec¬ 
trum complexity. Note that the hybrid spline and the 
piecewise analytical integration have already been used in 
a previous maximum entropy implementation, although 
only for the optical conductivity. Also, the frequency grid 
was less sophisticated and the other aspects described in 
the following sections where not present. This implemen¬ 
tation is described in Appendix F of Ref. [30] . 


The kernel matrix definition is described in details in 
Appendix E. More discussion on the grid definition ap¬ 
pears in section IV C and details are given in appendices 
H and I. Also, some preliminary steps are necessary be¬ 
fore defining the grid and computing the kernel matrix. 
Those steps are discussed in section IV A. 


B. The optimal a 


Let us assume for now that the spectrum A(w) has 
been computed for a wide range of a , starting at large 
values, where the spectrum minimizing x 2 /2 — aS is es¬ 
sentially the default model Z?(w) (which maximizes S 
alone), to small values, where the term \ 2 dominates. 
If we plot the function y 2 (a) in log-log scale, we obtain 
a shape similar to the schematic curve of Fig. 1, where 
three different regimes are found: At large a, x 2 does not 
change for a certain range of a where it is negligible com¬ 
pared to aS , and thus the spectrum barely departs from 
the default model. Let us call that region the default- 
model regime. Then, below a certain value of a, while 
aS is still the largest term, y 2 is not negligible anymore 
and plays the role of a constraint. In that range, reduc¬ 
ing a has a strong effect on x 2 > as it directly controls 
how well that constraint is satisfied. We call that range 
of a the information-fitting regime. Then, below some 
value of a , the effect of decreasing that parameter has 
a much smaller effect on y 2 . This is in fact the region 
where the noise in G is also being fitted, which we call 
the noise-fitting regime. 

The correspondence between the small slope region at 
low a and the range where the noise is being fitted can 
be made easily by looking at the function AG(iw n ) = 
G in (ioj n ) — K(iuj n )A (diagonal covariance case) at differ¬ 
ent values of a. Above the value of a where the slope 
starts decreasing rapidly, AG(iw ra ) has correlations be¬ 
tween frequencies iui n , while below that value, that func¬ 
tion contains mainly noise. This indicates that, at the 
point of the drop, G ou t{iuj n ) = K(iuj n )A is a good fit to 
the data Gi n {iuj n ), but because G ou t{iuj n ) does not yet 
contain noise, AG is essentially the noise in Gi n (iuj n ). 
This will become more clear in section V. 

Although the global behavior of x 2 ( a ) i n the default- 
model regime and the information-fitting one is intuitive, 
why the slope is much lower in the noise-fitting region is 
more subtle. A very small decreasing rate of \ 2 with 
decreasing a is expected at very small a, deep in the 
noise fitting region, when the term % 2 dominates and 
the entropy term mainly prevents A(u>) from becoming 
negative where it is close to zero. However, in the range 
of a where the slope decreases rapidly with decreasing 
a, the entropy term is still in fact much larger than x 2 , 
and A and G out = K A are smooth functions, namely, no 
noise from the data has been fitted yet. To understand 
the drop in the decreasing rate of y 2 with a, expression 
(G3), 


6Aj = A ctj 


K J K + a,AwA 


j-i 


x ^Awln(D 1 Aj_i) + Aw^j . (23) 


from Appendix G is very useful. Here, Aw, D, and A are 
the diagonal matrices with Aw, D and A as their diag¬ 
onal, respectively. The spectrum at ctj = otj~i — A ctj is 
given by Aj = Aj- i + SAj, assuming that the spectrum 
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Log (x 2 ) 



FIG. 1. Schematic behavior of log(y 2 ) vs log(a). Three 
regimes can be identified. 


Aj— i at oy _1 is known and that SAj is small. An impor¬ 
tant point about expression (23) is that it does not explic¬ 
itly contain the data Gi n: or AG = Gi n — KA, which is es¬ 
sentially the noise in the data when most of the available 
information is contained in A. Instead, 8A is a smooth 
function at the crossover between the information- and 
noise-fitting regions since Aj-\ is smooth at that point, 
and therefore SG = K 5A is necessarily also smooth and 
cannot reduce by much the amplitude in the noisy func¬ 
tion AG. To summarize, expression (23) tells us that, 
not only does the entropy term in y 2 /2 — aS controls the 
distance AG, it also enforces smoothness on A, which 
prevents easily fitting the noise. 

This behavior in x 2 as function of a is also observed in 
stochastic analytic continuation (SAC). In that context, 
where x 2 is the energy of the system, the drop in the 
rate of change in \ 2 as a function of temperature corre¬ 
sponds to a drop in the specific heat, and thus to a phase 
transition. [12] Although the entropy does not appear ex¬ 
plicitly in SAC, the statistical treatment in the canonical 
ensemble implies that it is maximal, given the value of 
the average “energy” (y 2 ). The drop in d\ 2 /da observed 
in both approaches must therefore have a similar origin 
(the statistical entropy in SAC is equal to In Z + (% 2 )/20, 
where the partition function Z = Tr[exp(—y 2 /20)] is a 
trace over all configurations {A(wj)Aw,}). The phase 
transition analogy of Beach [12] that we just discussed 
can also be seen in the sudden entropy drop discussed by 
Sandvik [10]. 

Now, because we want the spectrum to contain the in¬ 
formation in G, but not its noise, it is clear that the op¬ 
timal a is somewhere in the crossover region between the 
information- and the noise-fitting regimes. This is also 
where the basic assumptions for the likelihood, assump¬ 
tions 1 and 2 in section II, become valid, and therefore 
where the maximum-entropy approach is consistent. 

The crossover region is well delimited if the covariance 
matrix C is a good estimate of the actual one. This 


is not so on the other hand when the covariance is not 
well estimated in some frequency ranges. To illustrate 
this, let us first consider a diagonal covariance for sim¬ 
plicity. Then it is the standard deviation a that con¬ 
trols the rate at which G out becomes close to Gj n as a 
decreases. When all the information contained in Gi n 
is in the spectrum A , but not its noise, the function 
AG(icu n ) = GijiijiiOn) G ou t{iiOn) = Gi n (iuj n ') Ki^iuj^A 
contains mostly noise. Indeed, if G ou t is smooth and is 
a good fit to G in , which is the basic assumption here, 
then AG(ioj n ) is approximately the noise in Gi n . Now, if 
a(i(jj n ) is well estimated, A G(ico n ) becomes noisy around 
the same value of a for all values of iu> n . However, assume 
that cr is underestimated in a given frequency range, then 
the corresponding A G{iuj n ) becomes noisy at a larger 
a in that range of frequencies. Then, at the value of 
a where A G(ico n ) becomes noisy in the other frequen¬ 
cies ranges, the frequency range where a was underesti¬ 
mated will be overfitted. Therefore, when the ratio of 
the cr(iL 0 n ) between different frequency ranges is incor¬ 
rect, some noise is fitted in some frequency ranges while 
some information has not yet been fitted in others. Large 
errors in cr therefore reduce the range of a where only in¬ 
formation is fitted, and create a large crossover region, 
where both noise and information are fitted at the same 
time. On the other hand, if a is a good estimate, the 
crossover region is narrow. When the covariance is not di¬ 
agonal, the analysis is the same, but we have to consider 
a and AG expressed in the eigenbasis of the covariance 
C. 

Because of the rapid change of slope in log x 2 as a func¬ 
tion of logo; at the crossover between the information¬ 
fitting and noise-fitting regimes, there is a peak in the 
curvature of that function at the crossover. We choose 
the optimal a , say a *, at the maximum curvature, and 
we look at the variation of the spectrum within the width 
of the peak to obtain an estimate of the uncertainty of 
the spectrum. No observable change to the spectrum 
within the crossover region points toward good quantita¬ 
tive accuracy of the results. Other diagnostic quantities, 
discussed in section III C, can be used to analyze system¬ 
atically the results to assess the quality of the fit and the 
accuracy of the resulting spectrum. 

Unlike in the classic and Bryan methods, the reliability 
of this approach to choose a does not depend on the 
proximity between the spectrum and the default model. 
Indeed, it yields the correct value for the optimal a even 
when the spectrum is extremely different from the default 
model, as illustrated by the second example of section 
V (Fig. 4). This is an important advantage over the 
conventional methods since very little prior information 
is needed on the spectrum to obtain good results. 

In the classic or Bryan’s methods, the following pro¬ 
cedure is often used to obtain a default model that is 
close to the final result: (a) One obtains data at high 
temperature, where the spectrum is simple and the ana¬ 
lytic continuation easy, (b) The result for the spectrum 
is then used as the default model for a slightly lower tern- 
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perature. (c) This annealing process is pursued until the 
lowest temperature. [27] We could also use this annealing 
procedure, but it is not necessary in our case. Note that 
a variation of a is different from annealing since the data 
and default model are not changed as a is varied. 

The choice of a at the point where the slope of log y 2 
as a function of log a drops has been used before with 
the stochastic approach. [12, 14] The justification for this 
choice was not clear however. The analysis in the context 
of the maximum entropy approach clarifies why a should 
be chosen in that region. 

To apply this approach to find the best a , the spectrum 
must be computed typically for a few hundred values of a. 
We thus need an efficient method to minimize y 2 /2 — aS. 
This part of the calculation is discussed briefly in section 
IV E and details are given in Appendix F. 

C. Diagnostic tools 

As mentioned above, Gi n (iuj n ) — G ou t(ioJ n ) can be 
used to determine when most of the information has 
been fitted. Now, assuming again a diagonal covari¬ 
ance matrix, a more convenient quantity to analyze 
is the normalized function AG(zw n ) = [ Gi n (iuj n ) — 
Gout{iu n )]/<j(iu>n)- Indeed, if a is accurate, A G(iuj n ) 
should contain only noise of constant amplitude at 
a*, with a variance around unity. Its autocorrelation 
AG 2 (An) = (1/AO Yl n AGK)AG(w n+ An) must there¬ 
fore look like a Kronecker delta, although noisy because 
N is finite. However, once the optimal a is reached, the 
shape of AG(zw n ) and AG 2 (An) remains qualitatively 
similar at smaller a , and their amplitude decreases very 
slowly. By contrast, at large values of a, AG(zw n ) is a 
smooth function of ui n , since G ou t(iuJ n ) is smooth and 
farther away from Gi n (iu ) n ) than <j(iw n ), and AG 2 (An) 
is broad since long range correlations are present in 
AG(ico n ). Therefore, those two functions provide a way 
to determine whether a > a* or a < a*, i.e. whether 
the optimal a has been reached. 

The function AG(zw n ) (or A Gi where i is a covari¬ 
ance matrix eigenvector index), is also very useful to tell 
whether the error estimate is good. Indeed, if it is the 
case, at values of a where A G(iuj n ) is noisy, the noise 
amplitude will be comparable at all frequencies. If some 
ranges of frequencies in AG(ioj n ) are noisy while oth¬ 
ers still contain correlations, this means that the ratio of 
the errors between those frequency ranges is off. In such 
cases, different frequency regions in A(u) will “converge” 
at different values of a, an undesirable state of affairs. 

If C is not diagonal, the functions AG and AG 2 to 
be analyzed must be expressed in the eigenbasis of the 
covariance matrix C. Therefore, in that case we use 

AG = G in -G out , where G = Vc-'WG and U contains 
the eigenvectors of C, such that C = IJfCU is diagonal. 

Finally, other quantities that are useful to estimate the 
quality of the analytic continuation are the curves of A(<J) 
at a few sample frequencies as a function of log a. 


As we will see in specific examples treated in section V, 
if the quality of the data is sufficient, those curves have 
plateaus in a range of a around a*. If precision is not 
sufficient to observe plateaus, inflexion points or extrema 
will typically still be present in the curves around a*. In 
other words, there are local minima in |GL4(cu)/<iloga| 
around a*. Then, at a certain value of a below a*, the 
sample spectra A(u> s i amp ) have unpredictable behavior, 
increasing or decreasing quickly, as the system’s condi¬ 
tioning worsens. The A(uj s i amp ) as a function of logo 
curves also tell if the error is well estimated. If it is, 
the A(io^ amp ) at the different sample frequencies have 
the same optimal a. Otherwise, if the ratio of the er¬ 
rors in different frequency ranges is off, the local minima 
in \d,A(w s i amp )/d\oga\ above the onset of unpredictable 
behavior will appear at different values of a for different 
frequencies. In those cases, and if improving the estimate 
of the covariance is impossible, the curves of A(w/ amp ) 
may be very useful to select the optimal a from a specific 
real frequency range instead of the average value obtained 
from the curvature of logy; 2 as a function of log a. This 
may also be preferable to averaging the spectrum over a 
range of a, since there is less risk of “contaminating” the 
spectrum in the chosen frequency range with underfitted 
or overfitted spectra. 

To summarize, in addition to the function y 2 (a:) ; three 
other quantities give information about the optimal value 
of a and more: A G(icu n ), AG 2 (An), and A(u s i amp ) ver¬ 
sus log a. Those quantities also allow to evaluate the 
quality of the fit and the reliability of the result. Practi¬ 
cal examples of this analysis are given in section V. 

IV. STEP-BY-STEP PROCEDURE 

Here we discuss briefly the steps in our maximum en¬ 
tropy implementation. The technical details are left in 
appendices. 

A. Extracting information directly from the input 
Green function 

Under certain conditions, we can obtain relatively eas¬ 
ily two types of information directly from the Matsubara 
data: moments and a low frequency peak width and 
weight if present. This information is then used through¬ 
out the preprocessing stage, allowing important opti¬ 
mizations, some of which have been discussed in section 
III A. The first step in the procedure is therefore to ex¬ 
tract that information, when available. 

1. Extracting moments 

If the moments are not known, and if there are enough 
Matsubara frequencies, or imaginary time slices, such 
that the information on the moments is actually present 
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in the data, then the moments can be extracted from 
the Matsubara data. At least the first two moments are 
generally well defined. The procedure to extract the mo¬ 
ments basically involve fitting a Matsubara frequency, 
or imaginary time, asymptotic expansion to G(iw ra ), or 
G(r), respectively, using a least-squares fitting method. 
The details for both cases are given in Appendix B. In the 
software, the moments can also be provided by the user. 
Note that, for a given model, the expressions for the first 
few moments as functions of Hamiltonian parameters can 
be obtained by straightforward commutator calculations. 
This is documented in standard many-body textbooks or 
here [31]. We do not discuss this calculation here, but 
only consider the analytical continuation procedure it¬ 
self, starting from some numerical Matsubara data that 
can be represented by expression (19) or (20), and useful 
additional data, such as a covariance matrix and spectral 
moments. The approach we use here is meant to apply 
in the most general case possible, and not only to specific 
Hamiltonians. 


2. Finding the width and weight of a quasiparticle or Drude 

peak 


Let us assume we know Gi n (iu) n ) either directly, or 
from the Fourier transform of G,; n (r) (see Sec. C be¬ 
low). It can be very useful to be able to tell directly 
from G in (iui n ) if the system is metallic and to have an 
estimate of the width and the weight of the quasiparticle 
peak, or of the Drude peak in the case of optical conduc¬ 
tivity. This information can then be used to determine 
the optimal frequency step around w = 0. 

When there is a well-defined peak near to = 0, with 
weight essentially below a frequency W qp well separated 
from the rest of the spectrum, whose weight we assume 
is mostly above a frequency Wi nc , then the Green func¬ 
tion in the frequency range W qp < w n < Wi nc can be 
expanded in a Laurent series 


G LS (iu n ) « -(*w„) i - 1 M“ c - ... - (iw n ) 2 AP” c 
Mq P M qp M qp 

iui n + ( iw n ) 2 + + ( i(jJ n ) N+1 

(24) 


-(iw„)Mi" 2 c -Mi r 


me 
1 “ 


In the above expression, the M™ c s are the inverse mo¬ 
ments of the high frequency part of the spectrum, and 
the M qp s are moments of the quasiparticle peak. This 
expression is for a fermionic Green function. Those par¬ 
tial moments can be obtained by the same fitting proce¬ 
dure as the one used to extract the moments of the com¬ 
plete spectrum. The quasiparticle peak weight is then 
Mq P , its position is given by M qp /M$ p , and its width 
is \J M qp /M q P — (M qp /M q y. This procedure will give 
accurate values if the number of Matsubara frequencies 
satisfying W qp < ui n < Wi nc is sufficient to ensure con¬ 
vergence of the partial moments. If that condition is 
not satisfied, it will nevertheless produce the right orders 


of magnitude. Knowing the width of the low frequency 
peak is useful to define the frequency step around w = 0, 
and thus to define a frequency grid adapted to the spec¬ 
trum’s sharpest part. More details are given in Appendix 
D, with the discussion for the bosonic case. 


B. G(t) as input data 


As mentioned in section III A, we use the representa¬ 
tion (19) as the relation between G and A during the 
calculation. Therefore, when the input data are given as 
a function of imaginary time Gj n (r), we need to Fourier 
transform it to obtain Gi n (ico n ). To do so, knowing some 
spectral moments becomes essential. Indeed, to obtain 
a G in (ita n ) as accurate as possible, i.e. without adding 
large errors, the best approach is to Fourier transform 
the cubic spline of Gi„(r), using the first and second 
moments to define the two additional equations required 
to define the spline. Using splines and partial integra¬ 
tion for the transform enforces [30, 32] the required [33] 
asymptotic behavior of Gj n (iw„), and also produces more 
accurate results at low frequency than a discrete Fourier 
transform. The Fourier transforms of the data and the 
covariance matrix will add some noise, but the use of fast 
Fourier transforms minimizes that noise by minimizing 
the number of operations in the transforms. More de¬ 
tails on this procedure, and how to obtain the covariance 
matrix of Gi n (iui n ) from the covariance of Gj„(r), are 
described in Appendix C. Once the moments, G, n (tw„), 
and its covariance matrix are computed, G r „(r) is not 
used anymore in the calculation. 


C. Frequency grid and default model 


Optimizing the real frequency and the Matsubara grids 
is critical for the efficiency of the maximum entropy cal¬ 
culation. As shown in Appendix F, each iteration in the 
minimization of Q = \'x 2 — aS requires 0(N% n N u ) op¬ 
erations (for N Un < N u ). The number of values of a 
that need to be computed is typically a few hundreds. 
In addition, the grid should allow a good resolution of 
all the structures in the spectrum. Optimizing both the 
Matsubara and real frequency grids is thus necessary to 
keep computation time reasonable without affecting the 
quality of the result. 

To define a grid adapted to the spectrum, while min¬ 
imizing the number of points, we use a dense grid in 
the main spectral range [wj,w r ], where most of spectral 
weight is found, and a step size that increases with w 
outside that range. By default, in the software, the step 
Aw in the main spectral range is constant and defined 
by setting (w r — uii )/Aw to a preset value, which can be 
modified by the user. At high frequency, the grid has a 
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constant Am increment where 


m = 


id — UJQr 

1 


OJ > oo r 

LU < UJi . 


(25) 


The parameters woz and w 0r are determined by the high 
frequency cutoff, which can be modified by the user (see 
Appendix H for more details). The frequencies u>i and 
ui r that define the main spectral region are set by de¬ 
fault to Mi — A gjstd and M\ + A u i s td, respectively, where 
Aoj st .d = s/M 2 /M 0 — (M 1 /M 0 ) 2 is the standard devia¬ 
tion of the spectrum. This choice of grid at high fre¬ 
quency is the most natural one considering the type of 
interpolation method used to define the kernel matrix K, 
and discussed in section III A. It also efficiently reduces 
the number of points in the grid, which helps keep the 
computation time short. The grid in u is defined such 
that the step in to is continuous at the boundaries of the 
main spectral region. This is actually important to ob¬ 
tain smooth spectral functions. Further details on the 
grid definition are provided in Appendix H. 

The software offers a few possibilities to define a more 
adapted grid in the main spectral range. The simplest 
choice given to the user is to choose the boundaries and 
the step Aw in that range. There is also a tool to define a 
non-uniform grid with a smoothly varying step between 
intervals of different step sizes provided by the user in 
the form of a vector 


[wi Awi w 2 Aw 2 W 3 ... Awjv-i wat] , (26) 

where the frequencies w^ define the intervals’ boundaries 
and Aw, the corresponding step size in each interval. 
Given this input by the user, the frequency step is de¬ 
fined to follow a hyperbolic tangent shape around the 
boundaries. An example of the resulting frequency grid 
density 1/Aw as a function of w corresponding to three 
intervals of different step sizes is illustrated in Fig. 6 of 
Appendix H. The reason for forcing a smooth variation 
of the grid between different intervals is that discontinu¬ 
ities in Aw can cause spurious oscillations in the results. 
Finally, the user can also provide a customized grid for 
the main spectral region. The grid outside the main spec¬ 
tral range is always defined in the same way. 

If a quasiparticle peak has been found and character¬ 
ized using the Laurent series fit to G(iio n ) described in 
section IV A 2, then the step Aw around w = 0 is set to a 
fraction of the width of the peak and a vector of the type 
(26) is generated automatically to define a non-uniform 
grid with a small number of points for the main spectral 
region. This feature allows the possibility of automati¬ 
cally computing the spectra for large sets of data. 

As for the Matsubara frequency grid, if the moments 
are used as constraints, the Matsubara frequencies cor¬ 
responding to the asymptotic part can be removed. This 
is the simplest modification of the Matsubara grid that 
can improve computational efficiency without losing ac¬ 
curacy. In addition, when the number of Matsubara fre¬ 
quencies below the cutoff becomes large at low tempera¬ 
ture, a non-uniform grid can be used to further reduce the 


number of frequencies. This is because sharp features in 
the spectrum are usually located at low frequency, while 
the spectrum broadens as |w| increases. Therefore, as the 
magnitude of w ra increases, the grid can become sparser. 
The grid we use is given in Appendix I. 

The default model should contain the information 
known about the spectrum, but should otherwise be fea¬ 
tureless, to ensure that any structure appearing in the 
spectrum comes necessarily from the data. In the pro¬ 
gram, the default model is defined as a Gaussian with the 
same first two moments as the spectrum, if they have ei¬ 
ther been extracted from the high frequencies of the data 
or else provided by the user. Otherwise, if the grid in 
the main spectral region is defined by the user, then the 
width and center of that region are used as the width and 
center of the default model, respectively. Other default 
models, such as a flat spectrum, are available and can be 
selected by the user. Finally, the default model can also 
be entirely provided by the user. 


D. Defining the kernel matrix 

The kernel matrix can be defined once the real fre¬ 
quency and Matsubara grids are defined. The general ap¬ 
proach was discussed in section III A. In practice, defin¬ 
ing K with the hybrid spline interpolation method de¬ 
scribed in section III A involves analytically integrating 
(19) in the case where A(w) is a cubic polynomial in w 
and the case where it is a cubic polynomial in u, defined 
by Eq.(25). This yields a linear relation between the 
vector G and a vector formed by the spline coefficients. 
Then, the linear relation between those coefficients and 
the vector A are obtained by computing the general so¬ 
lutions for the linear systems of equations defining the 
splines in the different regions of the real frequency grid 
(low and high frequency regions). The details are given 
in Appendix E. 


E. Computing the spectrum as a function of a 

To obtain the spectrum as a function of a in the rele¬ 
vant range, we have to minimize Q = \— aS , where \ 2 
is given by (9) and S , by (13). This function is bounded 
from below and is convex. It therefore has a unique min¬ 
imum that is the solution to VQ = 0. Although that 
equation cannot be solved exactly because the entropy 
part is non-linear, we know that, for large a, the solution 
must be close to the one minimizing the entropy term 
only, namely A = e~ 1 D. In addition, if the solution A a 
at a given a is known, then the solution for a new value of 
a, A a /, can be obtained by Newton’s method, provided 
that A a / is close to A a . In other words, in that case we 
can solve iteratively a linearized version of the system 
VQ = 0. 

Therefore, the solutions for the whole a range can be 
obtained by starting from a large value of a, using e~ l D 
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as the initial spectrum, and decreasing a at a rate such 
that | A a — A a t\/A a is small, so that the minimization 
routine converges. The starting value of a is chosen to 
ensure that the first solution is very close to the default 
model. The calculation stops when the slope in logy 2 as 
a function of log a is smaller than a certain fraction of its 
maximum value in the whole a range (see the last two 
paragraphs of Appendix F). 

There are a few obstacles to overcome when solving the 
linearized system VQ = 0. On one hand, the system is 
ill-conditioned and, on the other hand, negative values of 
A can appear because of linearization, even though the 
entropy term forbids it initially. The bad conditioning 
can be dealt with by using a preconditioning step, and 
by solving the system using a singular value decompo¬ 
sition (SVD). [26, 27] Note that all the singular values 
are kept. The SVD is typically more efficient numeri¬ 
cally than Gaussian elimination for that particular type 
of system. As for the problem of negative values of A, 
when the argument of log[A(wj)/D(o;j)] in VS is below 
the smallest numerical value possible, the log is smoothly 
continued by a quadratic function. Although not strictly 
forbidding negative values, this strongly penalizes them 
and ensures stability of the algorithm. As a result, neg¬ 
ative values of very small amplitude may still appear, 
but only in regions where the spectrum should actually 
vanish. The minimization approach is described in more 
details in Appendix F. 


F. Finding the optimal a 

To find the optimal a, we compute the curvature in 
logio X 2 as a function of 7 log 10 a, where 7 is an ad¬ 
justable parameter smaller than 1. The local curvature 
is computed by fitting an arc of a circle to a section of 
the curve, and taking the inverse of the radius as the 
curvature. The sign is chosen to be positive if the circle 
center is above the curve and negative if below. Although 
fitting an arc is numerically more tedious than fitting a 
parabola, it was found to produce a smoother curvature 
as a function of a. Then, as discussed in section IIIB, 
the optimal a is chosen at the maximum of the curvature. 
The parameter 7 is used to increase the amplitude of the 
peak corresponding to the crossover region between the 
information- and the noise-fitting regions, compared to 
other peaks that may appear in the information-fitting 
region. A value of 7 smaller than 1 also shifts the loca¬ 
tion of the maximum toward lower a, but not by a large 
amount if 7 is not too small. The value of 0.2 was found 
to be a good compromise to identify clearly the correct 
peak in the curve, without shifting too much the value 
of a. This parameter is considered as a fixed internal 
parameter in the program, but its default value can be 
modified easily by the user. 


G. Diagnostics 

At the end of the calculation, in addition to the op¬ 
timal spectrum, the curves used to find the optimal a, 
namely, log 10 y 2 as a function of log 10 a and its curvature, 
are displayed, along with the diagnostic quantities dis¬ 
cussed in section III C. Analyzing those quantities yields 
information about the quality of the fit, the reliability of 
the result and, if necessary, how to improve the result. 
Section V gives examples of this analysis. The documen¬ 
tation on the software also explains how to perform the 
analysis. [19] Since the computation takes only between 
a few seconds and a few minutes, trying different sets of 
input parameters to improve the result is easy. For ex¬ 
ample, between iterations one can modify the frequency 
grid parameters to obtain the grid that optimally resolves 
all structures in the spectrum. 


V. BENCHMARKS, APPLICATIONS, AND 
DIAGNOSTIC TOOLS FOR ACCURACY 

To test our approach, we first use the program to ana¬ 
lytically continue Green functions obtained from insert¬ 
ing spectral functions in the spectral representation (19). 
For the purpose of applying maximum entropy, Gaussian 
noise is added to those Green functions. By comparing 
with the exact spectrum we can then assess the accuracy 
that can be reached by the program. Finally, an example 
with a real physical Monte Carlo simulation is given. In 
all examples, frequency and temperature are in energy 
units. 


A. Two examples with known exact results 

We illustrate our approach first with a simple spec¬ 
trum, and then with a difficult case that has both coher¬ 
ent and incoherent features. 

The first example, shown in Fig. 2, is the analytic 
continuation of a fermionic Green function given as a 
function of Matsubara frequency, with a diagonal covari¬ 
ance, a constant relative error of 10 —5 , at a temperature 
T = 0.05. The spectrum used to compute G{iui n ) from 
(19) is the sum of two Gaussians. It is shown in green 
in Fig. 2(a). In Fig. 2(b), we observe clearly the three 
regimes discussed in section III B for y 2 as a function of a 
on a log-log plot. If we take the optimal a at the position 
of the maximum in the curvature of log y 2 as a function 
of 7 log a, where 7 = 0.2, shown in Fig. 2(c), we obtain 
the spectrum in red triangles shown in (a). 

Figure 2(d) shows, as a function of a, the spectral func¬ 
tion at the frequencies corresponding to the two maxima 
and to the minimum between them in Fig. 2(a). As dis¬ 
cussed in section III C, the plateaus in the curves indicate 
convergence of the spectrum at those frequencies, which 
is expected for sufficiently precise data. As seen from the 
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FIG. 2. a) Spectrum obtained with the program at a tem¬ 
perature T — 0.05, with a constant relative diagonal error of 
10 ~ 5 , b) \ 2 as a function of a for the example in a), c) curva¬ 
ture computed from log 10 \ 2 as a function of 7 log 10 a, where 
7 = 0.2, plotted as a function of log 10 a, and d) spectrum 
as a function of a at sample frequencies corresponding to the 
three extrema of the spectrum in (a). 


FIG. 3. Normalized errors and auto-correlations as a func¬ 
tion of frequency index at three different values of a for the 
example given in Fig. 2. a) error and b) autocorrelation at 
a = 1000a*, c) error and d) autocorrelation at a = 10a*, e) 
error and f) autocorrelation at a*. 


optimal spectrum shown in Fig. 2(a), those plateaus oc¬ 
cur at values that coincide with the exact spectrum with 
very good accuracy. Other useful information is obtained 
by comparing the curves to each other. A good overlap 
of the plateaus over some range of a indicates that the 
different parts of the spectrum reach their optimal value 
around the same a. When this is not the case, then the 
ratios of the standard deviations a(iui n ) between differ¬ 
ent frequency regions are probably incorrect, [34] causing 
the regions with an underestimated a to converge faster. 
Note however that, for a given precision, some parts of 
the spectrum may not converge, while others do. Sharp 
features, or parts of the spectrum that are very different 
from the default model, typically reach stability at lower 
values of a and thus require higher precision to converge. 
For instance, we note in Fig. 2(d) that the peak at oj = 1 
converges at a ss 10 5 , while the spectrum at the other 
peak converges at q « 10 7 . Something similar happens 
in the next example discussed below. Finally, below some 
value of a the spectrum eventually behaves in an unpre¬ 
dictable manner as a function of a since the conditioning 
of the linear system becomes bad at very low a. Indeed, 
in the brackets of expression (23) the term proportional 
to a that regularizes the linear system gradually vanishes 
as a decreases. 

Continuing the analysis of the example shown in Fig. 
2, consider in Fig. 3 the normalized distance AG(w„) 
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between the input Green function Gi n and output G ou t , 
as well as the auto-correlation AG 2 (An). Those func¬ 
tions are plotted for three different values of a, namely 
two values above a* as well as the optimal value a*. In 
(a), far above a *, A G(co n ) is a smooth function since 
the difference between G ou t and Gt n is much larger than 
the noise amplitude in Gi n . The corresponding auto¬ 
correlation in (b) is also smooth and broad since the cor¬ 
relations between different frequencies of AG are large. 
Slightly above a *, in (c), AG becomes noisy as Gi n —G out 
becomes comparable to the noise in Gi n . The auto¬ 
correlation is still broad, since AG still has well defined 
structures, but it is becoming slightly noisy. At a*, AG, 
in (e), is essentially noise, as expected from the discussion 
in section IIIB. This is confirmed by the noisy Kronecker 
S shape of AG 2 , in (f). In addition, AG 2 (0) is close to 
1, as expected since it is equal to y 2 /N, where N is the 
number of terms in y 2 , and the standard deviation for the 
data is known exactly in that example (see the discussion 
on the historic approach in section II). 

For the second example, we show in Fig. 4 the an¬ 
alytic continuation of another fermionic Green function 
computed with a spectrum that is the sum of three Gaus- 
sians, on which a constant relative diagonal error of ICG 6 
has been added. The scale in Fig. 4(a) is adapted to the 
two broad peaks while the scale in Fig. 4(b) is adapted to 
the very sharp central peak around w = 0. Note in Fig. 
4(c) the presence of the three regimes in y 2 as a function 
of a, and in Fig. 4(d) the plateaus indicating good quan¬ 
titative precision in the result. The much smaller width 
and much larger amplitude of the central peak compared 
to the two others makes this case difficult from a com¬ 
putational point of view. Indeed, if the grid in the main 
spectral region was uniform, while fine enough to resolve 
the central peak, it would contain thousands of points, 
increasing significantly the computation time. Therefore, 
in addition to the non-uniform grid used automatically 
by the program outside the main spectral region, and 
described in Appendix H, a non-uniform grid within the 
main spectral region has also been used. For that pur¬ 
pose, as explained in section IV C, a vector of the form 
(26) was given to the program, which has generated a 
grid with a step varying smoothly between the provided 
intervals. For the example in Fig. 4, the step around 
w = 0 is Aw = 2 x 1CT 4 while Aw = 0.1 for the peak 
centered on w = —3 and Aw = 0.05 for the one at w = 2. 

The standard deviation of the central peak is equal to 
the temperature T = 0.002, putting the magnitude of the 
first Matsubara frequency ttT outside its frequency range. 
This might seem to make the analytic continuation diffi¬ 
cult for that part of the spectrum since essentially only 
information on the moments of that peak is contained 
in the Matsubara Green function. The central peak is 
nevertheless very well reproduced, as shown in Fig. 4(b). 
It converges for a value of a only slightly lower than the 
two broad peaks, as shown in Fig. 4(d). This is analog 
to what is observed in the example of Fig. 2. 

The temperature in the example of Fig. 4 is very small 



to 



FIG. 4. a) Spectrum obtained with the program at a tem¬ 
perature T = 0.002, with a constant relative diagonal error of 
10~ 6 , b) magnification of the low frequency peak, c) \ 2 as a 
function of a and d) spectrum as a function of a at sample 
frequencies corresponding to the maxima. The value for the 
w = 0 peak is given on the left axis while the values for the 
two other peaks are given on the right axis. 
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compared to the total width of the spectrum, therefore 
the number of Matsubara frequencies that are within 
the spectral range, and which have to be taken into ac¬ 
count, is large. If all the frequencies below the onset fre¬ 
quency of asymptotic behavior ui as were used, the compu¬ 
tation time would be quite high since the minimization 
of Q scales as 0(N% n N u ) if N Un < N„, or 0{N Un N%) 
if N Un > N u (see Appendix F). However, although the 
whole frequency range below u} as contains relevant in¬ 
formation on the spectrum, using all the Matsubara fre¬ 
quencies in that range is not necessary (see Sec. IV C). In 
that example, the spectrum was obtained with high ac¬ 
curacy using a non-uniform Matsubara grid, of the type 
described in Appendix I, which contains 321 frequencies 
instead of about 1000 below uj as . 

Finally, we notice in figures 2(d) and 4(d) that there is 
a range of values of a where the values of the spectrum 
at the maxima are higher than both the exact values and 
the optimal ones (both essentially identical). The struc¬ 
tures in the spectrum in that range of a are therefore 
sharper than those of the exact spectrum. This means 
that, if the data’s standard deviations were comparable 
to the values of |AG| in that range of a, the final Max- 
Ent result would have sharper structures than the exact 
spectrum. Therefore, contrary to a common belief about 
maximum entropy, even for a very smooth and feature¬ 
less default model, the maximum entropy results are not 
systematically smoother than the exact result. Note that 
this “overshoot” of the values at the maxima as a func¬ 
tion of a is not a numerical artefact. The spectra in that 
range are the actual solutions of the equation VQ = 0 at 
each a. This can also be verified by starting the calcula¬ 
tion at the lowest a, using the spectrum obtained at that 
value as the initial spectrum, and recomputing the spec¬ 
trum for increasing values of a instead of the opposite. If 
there is no hysteresis in the sample-frequency curves, as 
we did observe during the tests, then we can have very 
high confidence in the results. 


B. Analytic continuation for single-site dynamical 
mean-field theory 

Finally, we present in Fig. 5 the result of analytic 
continuation of single-site dynamical mean field theory 
(DMFT) data obtained with a continuous-time quantum 
Monte Carlo solver [35] for the one-band Hubbard model 
on the half-filled bipartite square lattice at U = 6t and 
temperature T = O.Olf, where t is the nearest neighbor 
hopping[36]. The covariance matrix was computed using 
the Green functions of the last 24 DMFT converged itera¬ 
tions. The relative error is of order 5 x ICC 4 at most. This 
type of problem was studied in detail at the beginning of 
DMFT when quantum Monte Carlo algorithms did not 
allow low temperature and high precision. [32] However, 
the qualitative form of the spectrum is well known. Here 
we find that, as expected, the peak in the density of states 
at the Fermi level becomes very sharp at low tempera¬ 




FIG. 5. a) Spectrum for single-site DMFT on a square lattice 
at U = 6, filling n = I and temperature T = 0.01, b) y 2 
as a function of a for the above example, c) spectrum as a 
function of a at sample frequencies. The relative error in the 
data is of order 5 x 10 -4 at most. 


ture. This can be obtained even in the presence of broad 
incoherent features at high frequency. 

As shown in Fig. 5(b), y 2 versus a has the same 
general shape as in the previous examples. The spec¬ 
trum shown in 5(a) is the result at the maximum of 
the curvature in logy 2 versus 0.2 log a, as in the pre¬ 
vious examples. That example is particularly interesting 
for benchmarks because particle-hole symmetry makes 
it clear when only information is fitted and when noise 
is also fitted. Indeed, since the noise does not respect 
particle-hole symmetry (the real part of the data does 
not vanish completely), noise fitting starts when the spec- 
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trum stops being symmetric as a decreases. This value 
is easy to find in Fig. 5(c), where the spectrum is plot¬ 
ted at symmetric frequencies on each side of uj = 0, as a 
function of a. We note, by comparing figures 5(b) and 
5(c) that the highest a where the values of the spectrum 
at two symmetric frequencies depart from each other is 
effectively around the maximum curvature in (b). Figure 
5(c) also suggests a relatively good quantitative accuracy 
of the result in 5(a), considering that the input data here 
are the result of an actual physical simulation, and not 
an ideal case like in the previous examples. Indeed the 
curves at sample frequencies have a relatively stable re¬ 
gion as a function of a. However, the absence of flat 
plateaus suggests a precision of the order of 10% at most 
around w = 0. 

We should mention that when the covariance is not di¬ 
agonal, AG is not a smooth function anymore at large 
a because the diagonalization routine sorts eigenvalues 
in decreasing order. This makes the difference between 
the information and noise fitting regimes less clear to 
estimate from that quantity, which therefore becomes 
slightly more difficult to analyze. On the other hand, 
the difference between the auto-correlations AG 2 (An) at 
a > a* and a < a* is still clear, with one exception, 
namely for a particle-hole symmetric case like the exam¬ 
ple we just considered in Fig. 5. Indeed, in that case 
the real part of the Green function should vanish, but it 
is mostly noise because of intrinsic computational errors. 
In addition, because there are correlations in general be¬ 
tween the errors on the real and on the imaginary parts, 
then the transformation (F4) mixes the real and imag¬ 
inary parts of the Green function. This can cause AG 
and AG 2 (An) to look noisy for all values of a. There¬ 
fore, those quantities are much less useful for the analysis 
in the particle-hole symmetric case, including the exam¬ 
ple of Fig. 5. This does not affect however the behavior 
of x 2 and A(cj^ ample ) as a function of a, and therefore it 
does not influence how the optimal a is determined. In 
the general case where both the real and imaginary parts 
have structure, AG and AG 2 (An) are useful, although 
AG can be more difficult to read for non-diagonal covari¬ 
ance. 


VI. DISCUSSION 

The approach presented here relies critically on the al¬ 
gorithms discussed in section III. Indeed, because of the 
ill-conditioned nature of the problem and the bad scaling 
of the minimization problem with grid size, all aspects of 
the calculation must be optimized to compute the spec¬ 
trum in a reasonable time, and without sacrificing ac¬ 
curacy, for the hundreds of values of a required for the 
analysis. 

First, the real frequency grid size can be minimized be¬ 
cause we combine non-uniform real frequency grids with 
hybrid cubic splines in w and u = l/(w—w M ) that are well 
adapted to each other. Cubic splines are the best interpo¬ 


lation approximation for smooth functions on arbitrary 
grids. They allow minimization of the grid density for 
a given precision of the integrals. In addition, the hy¬ 
brid splines are preferable because they give much better 
accuracy than ordinary splines on the type of highly non- 
uniform grids we need. Second, using the Matsubara fre¬ 
quency spectral representation of the Green function also 
helps minimize both the real and imaginary frequency 
grid sizes. Indeed, because the kernel times the spec¬ 
tral weight can be integrated analytically when A(uj) is 
modeled as a piecewise polynomial function, the real fre¬ 
quency grid does not need to be adapted to the kernel, 
but only to the spectrum. In addition, working with 
Matsubara frequency functions instead of imaginary-time 
ones allows a straightforward optimization of the grid on 
the imaginary time axis, first by replacing the frequencies 
in the asymptotic region, iui n > ioj as , by constraints on 
moments and, second, by using an adapted non-uniform 
Matsubara frequency grid below the cutoff iu> as deter¬ 
mined during the moment-fitting procedure. Those care¬ 
fully chosen approximations and optimizations lead to a 
computational complexity that is only weakly dependent 
on temperature and spectrum complexity. 

The minimization algorithm is another key aspect of 
the implementation. Since the spectrum changes only 
slightly between consecutive values of a, the minimiza¬ 
tion algorithm based on Newton’s approach is very well 
adapted. In addition, the singular value decomposition 
takes advantage of the specific structure of the linear sys¬ 
tem to solve it more efficiently. The resulting routine is 
very fast. For example, on a 2GHz laptop, computing 
the spectrum for about 600 values of a with a grid of 
110 real and 95 Matsubara frequencies takes 15 seconds, 
and the same number of values of a for a grid of 550 real 
and 510 Matsubara frequencies takes about 8 minutes to 
compute. As a comparison, the computation time with a 
generic routine based on the trust-region-reflective algo¬ 
rithm that we used in a previous work was a few hours for 
a few hundred values of a and frequencies. [30] Speed can 
be very important for example in the context of modern 
ab initio methods that combine density functional meth¬ 
ods with dynamical mean-field theory. [37] The stochas¬ 
tic approach would not be a good choice in that context 
since the time required for analytical continuation can be¬ 
come longer than the computation time of the Matsubara 
data. [38] 

As discussed in section IIIB, it is a consistency re¬ 
quirement of the maximum entropy approach that a be 
chosen in the region where dlogx 2 /dloga! drops as a de¬ 
creases. This is where the likelihood (8) is actually valid, 
where most of the information has been fitted and where 
noise starts to be fitted as well. However, we still need a 
criterion to choose precisely a within that crossover re¬ 
gion. Since both information and noise are being fitted 
in that region, that criterion is not obvious in general. 
For data accurate enough however, the actual value of a 
within the crossover region is irrelevant since the spec¬ 
trum does not change within that region. This might 
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apply quite routinely to deterministic numerical calcula¬ 
tion, which typically has only small round-off errors, but 
it will apply more rarely to stochastic results. For this 
type of data, when the covariance is reasonably accurate, 
the crossover region is narrow, and the choice of a is still 
relatively easy. Indeed, as discussed in section IIIC, lo¬ 
cal minima in |cL4(u;)/dloga| are usually present in the 
crossover region and thus the spectrum will only have 
small variations over a narrow range of log a. Choos¬ 
ing the maximum of curvature in log y 2 as a function 
of 7 log a will thus give very similar results for different 
values of 7 . Given that the spectrum cannot have large 
variations when the crossover region is narrow and that 
there is still some information to fit within that region, 
it might be preferable to choose a *, the optimal a, closer 
to the noise-fitting region to ensure fitting as much in¬ 
formation as possible. The choice 7 = 0.2 serves that 
purpose. 

It is always preferable to look at the diagnostic func¬ 
tions to verify that they have the correct properties. At 
the optimal a , the function AG should look like noise 
and its autocorrelation should have a Kronecker S shape, 
while the curves A(ui s i amp ) should have either an ex¬ 
tremum or an inflexion point at a*. It may happen that 
AG seems to contain only noise, but its autocorrelation 
has a finite width. This indicates that the covariance 
matrix is not well estimated, in particular if it has been 
assumed to be diagonal, then a non-diagonal covariance 
should be used. Another possibility is that AG never 
becomes completely noisy. This is generally because the 
grid is not dense enough to produce a good fit, and it 

must be refined until AG has the correct properties at 

* 

a . 

The diagnostic tools become even more useful as the 
error estimates are less accurate and the region where 
both information and noise are fitted simultaneously be¬ 
comes wider. They can then be used to choose a such 
that the most relevant part of the spectrum has con¬ 
verged. For example, if we are more interested in the 
spectrum around u = 0 , one can choose a value of a 
for which AG is noisy at low Matsubara frequency and 
A{uj) is the most stable around uj = 0 as a function of 
logo. To allow the user to choose a different value of a 
than the one selected automatically, the program saves 
the results for a certain range of a around that value (by 
default, one decade above and below). For very bad es¬ 
timates of the errors however, some compromise must be 
made because some parts of the spectrum might become 
quite distorted because too much noise has been fitted, 
while other parts have not yet converged. The error esti¬ 
mate is therefore a very important part of the data. As 
for the input data itself, ensuring accuracy of the error 
estimate is the responsibility of the user. As discussed 
in section II and emphasized in the classic review, ] 
the maximum entropy formalism assumes that the data 
obeys Gaussian statistics with accurate estimates of the 
covariance. This can be checked by histogram methods 
and by binning procedures such as those available in the 


ALPS software. [39] In practice, the data are not per¬ 
fectly Gaussian and the error estimate is often not very 
accurate. The diagnostic tools provide at the same time 
a means to evaluate if the error estimate is good and, to 
some extent, to compensate if it is not. 

The question of whether the spectrum should be taken 
at a single value of a or averaged over a certain range is 
certainly relevant for poor estimates of the covariances. 
Since the software saves the spectrum for a range of a 
around the selected optimal value, averaging is a possible 
option. This can however produce unpredictable results 
since overfitted spectra are mixed with underfitted ones. 
On the other hand, when choosing the best a for a par¬ 
ticular spectral frequency range, one neglects the rest of 
the spectrum, but ensures that the part of the spectrum 
that seems the most interesting is as accurate as possible. 

The program can be used both in an interactive way, 
where the user can obtain results in real time and modify 
parameters to see the effect on the results, or in batch 
calculations. The level of automatization is sufficient to 
generate series of calculations without the need to inter¬ 
act with the program, assuming that realistic standard 
deviations or covariance matrices are provided. 


VII. SUMMARY AND CONCLUSION 

With the entropic prior, only the correlations present 
in the data are significant, [8] while positivity and 
smoothness of the spectrum are enforced. Given that 
the maximum entropy approach is very tractable compu¬ 
tationally, it is therefore the best practical choice for an¬ 
alytic continuation of noisy numerical data, whether the 
noise comes from a Monte Carlo procedure or from nu¬ 
merical round-off. The main challenge in that approach 
is to determine a, or a range of values of a, for which 
the result is the most accurate possible. As we have il¬ 
lustrated with examples, consistency with the assump¬ 
tions of the approach is often sufficient to determine the 
optimal value of a. This is possible because, for realis¬ 
tic estimates of the standard deviation (or covariance), 
there exist well separated regions where the behavior of 
X 2 as a function of a differs, depending on whether only 
information is being fitted or only noise is added to the 
fit. Those regions can be clearly identified when x 2 (a) is 
plotted on a log-log scale. The optimal a is in the transi¬ 
tion region between the information- and the noise-fitting 
regions. 

The information- and noise-fitting regimes also have 
distinctive signatures (a) in the normalized distance be¬ 
tween the data and the fit expressed in the eigenbasis 
of the covariance matrix, (b) in the autocorrelation of 
that function, and (c) in sample frequencies of the spec¬ 
trum plotted with respect to logo. Those functions can 
therefore be used as diagnostic tools to assess the quality 
of the fit and also to fine tune the value of the optimal 
a. The spectrum as a function of log(a) at sample fre¬ 
quencies also informs us on the accuracy of the spectrum. 
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Indeed, high accuracy is indicated by high stability of the 
spectrum as a function of log a around the optimal a. 

The accuracy and efficiency of our implementation is 
optimized using approximations adapted to the specific 
numerical challenges of the problem. This includes the 
use of non-uniform frequency grids, moments constraints, 
hybrid spline interpolation and piecewise analytical inte¬ 
gration of the spectral representation. Those optimiza¬ 
tions, combined with a good minimization routine, make 
it possible to obtains high quality results very quickly, 
and therefore to check the sensitivity of the results to 
the grid, the default model, or other parameters, in real 
time. The level of automatization of the program is also 
sufficient for batch calculations. Unlike the classic and 
Bryan approaches, our method to choose the optimal a 
does not require the default model to be close to the final 
answer. An annealing procedure is therefore not neces¬ 
sary to obtain reliable results. 

To illustrate our approach, we have shown three ex¬ 
amples with fermionic Matsubara frequency Green func¬ 
tions: Two fictitious examples with diagonal covariance 
and one example with real data from a physical simula¬ 
tion, with general covariance. The examples demonstrate 
that our approach can produce very accurate spectra 
when the Matsubara data are sufficiently, but realisti¬ 
cally, precise. They also show that our implementation 
can resolve very sharp and very incoherent features at 
the same time without difficulty, including a spectrum 
possessing a low-energy peak of width smaller than nT. 

Our maximum entropy implementation is freely avail¬ 
able under the GNU general public license as a software 
with a complete user interface. It is written in C++, but 
uses Python to automatically plot the results and diag¬ 
nostic functions at the end of the calculation. It comes 
with a detailed user guide and provides default options 
that make it easy to use for beginners. The numerous 
options make the software flexible for the experienced 
user. The software is fast and can handle fermionic and 
bosonic input Green functions, self-energies, and corre¬ 
lation functions, with arbitrary covariance, and whether 
the data are provided in Matsubara frequency or in imag¬ 
inary time. [19] 
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Appendix A: Heuristic derivation of the Maximum 
entropy method 


In this appendix we present a heuristic derivation of 
the maximum entropy method that suggests a connection 
with the stochastic analytical continuation approach. [1 ] 
It follows a derivation of statistical ensembles often 
found in textbooks and based on ideas from information 
theory. [22] 

Since the spectral weight A! (wQ on a frequency grid 
u>i of M points is normalized, it can be thought of as a 
probability Here we take 

M 

A / (cut) A cjj = 1. (Al) 

2-1 


In the main text, A (uj t ) is normalized to 2-7T instead. 

Define a stochastic process with statistically indepen¬ 
dent trials where cq is the a priori probability to fall on 
the interval i. By repeating the trials A times with A 
large, we can discover empirically the probabilities q t . Let 
the empirical probability P* for interval i be 

P t = = A’ { Ui ) Ax* (A2) 


where rq is the number of times that the result i is ob¬ 
tained in the trials (i is between 1 and M). The cen¬ 
tral limit theorem suggests that in the limit A —> oo 
(A >> M ), the most probable value of rq and its aver¬ 
age value will be identical and the corresponding Pj will 
converge to qi , in other words we expect 


lim Pi = 
N—>oo 


lim 

N—too 


Hi (A) 
A 


= qi- 


(A3) 


This can be checked as follows. 

The probability T (m, ri 2 , ■ ■ ■ nu) to obtain rq times 
the result 1, n 2 times the result 2, and so on, is given by 

A 1 

F (m, n 2 ,... n M ) = ~ -j—-r (gi)" 1 te)” 1 ■ • • 

ni!n 2 !... re at 

(A4) 

subject to the constraint 


M 

J2n t =N. (A5) 

i=1 


The most probable values can be obtained by maximiz¬ 
ing the logarithm. Stirling’s formula is then extremely 
accurate since A tends to infinity. Rewritten, it is 


lnT (rq, n 2 ,...) = InA! — lnrq! + rq In 


Qi 




= -A p i In 


Qi 


(A6) 

(A7) 
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which is called the relative entropy. We just need to 
maximize this quantity while satisfying the constraint 
Eq.(A5) which we rewrite in the form 

NY,Pi = N. (A 8 ) 

r 

Using a Lagrange multiplier, it is easy to find the ex¬ 
pected solution Pi = qi Eq. (A3). 

In statistical mechanics, the canonical ensemble is ob¬ 
tained by assuming a) that i labels microstates that are 
equiprobable (all qi are identical) and b) that, in ad¬ 
dition to the normalization constraint, there is another 
constraint, namely the average energy is fixed, 

53 = N J2 EiP * = NE - ( A9 ) 

i i 

The inverse temperature /3 is the Lagrange multiplier and 
the final result for Pi is given by the canonical distribu¬ 
tion. Note that N drops out of the final answers. It is 
just a conceptual device. 

Coming back to our problem, we must identify the con¬ 
straints and choose the a priori probabilities qi. There 
are two constraints, normalisation and the value of y 2 
which is quadratic in Pi. The parameter N/a plays the 
role of the Lagrange multiplier for the constraint on y 2 . 
In this case however, we do not know the value of y 2 , 
hence we must find a way to determine a. In the his¬ 
toric approach, one assumes that y 2 equals the number 
of Matsubara frequencies (not to be confused with N ), 
which fixes a, but there are several alternatives and our 
paper proposes a new one. Concerning the a priori prob¬ 
abilities, if we assume that all real-frequency grid points 
are equally probable, the qi are all equal to 1/M, with 
M the number of grid points, and the relative entropy 
becomes 

lnT(ni,n 2 ,...) = -N 53 Aw.j A' (<*;*) In ■ 

(A10) 

In the language of maximum entropy methods, our de¬ 
fault model is then D (wj) = l/(MAwi) and the choice 
of grid determines the default model. [16] It is clearly 
preferable to choose qi = D (w,) A u>i which decouples the 
choice of grid and of default model. In this case we have, 

lnT (m, n 2 ,...) = -IV 53 ^i A ' ( w ») ln ( A j) ) ’ 

(AU) 

which is the form used in the maximum entropy method 
before imposing the normalization constraint with a La¬ 
grange multiplier. As mentioned in section III A, instead 
of using a Lagrange multiplier, a constraint on normal¬ 
ization in y 2 can also be used. 


Appendix B: Extracting moments from G 

This appendix explains the procedure to extract the 
moments from either G(ico n ) or G(r). 


1. From G(iuj n ) 


For frequencies ui n > W, where A(|w| > W) « 0, the 
spectral form of the Green function, 


G( “" ] =L to 


duj A(w) 


lOJ n — cu 


(Bl) 


can be written as 


du .. . ( 1 u 

G(iu} n ) — — A{w) I — - 1 - ——+ 

J-OO 2tT \lU} n (lUnY 


UJ 


1 [°° du .. , 1 

iun . 27T 


(*W „) 3 

00 duj 


(iun) 2 J -oo 2 tt 

1 diO 2 . . , 

+ T' -V3 / W + • • • ) 

(lUnP J-oo 2tT 


wA(w) 


or 


M 0 Mr 

G\VjJ n ) — - -1- — 


Mo 


iuj n (tw„) 2 (?A>„) 3 


(B2) 


(B3) 


where Mj is the j th moment of A(u>). If the Green func¬ 
tion is known for a certain frequency range in that asymp¬ 
totic region, the first N moments can be deduced by fit¬ 
ting the function 


G as (icu n ) 


Mn 


ILUn 


Mi 

(iw n ) 2 


...+ 


Mm- i 
(iujn) 


N 


(B4) 


to that part of G.j n , assuming that the terms 
Mjv/ (iou n ) N+1 and Mm+i/( iu n ) N+2 can be neglected for 
frequencies larger than a certain ul- 
If we write G as (ioj n ) as 


Re[G as (zcai.)] 

Im[Gas(iu>L)} 
Re[G as (iUL+l)} 

Im[G as (*Wi+i)] 


0 ' 


0 

1 

UL +1 


~72 - 

J L +1 

0 


0 


0 

1 

— 

V L + 1 

0 


(-1) J 


0 

(- 1 / 


(- 1 ) 


N 1 

21V — 1 

L + l 


,2N 

°L +1 


Mo 

Mi 


M 2 N -1 

M 2 N 


or 


G as = XM, 

the moments are obtained by minimizing 

Xm = {Gfn ~ XM) t C]j 1 f (G® f - XM) 


(B5) 

(B6) 

(B7) 
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where the superscript or subscript HF, for high- 
frequency, indicates that co n > lol- To minimize Xm, 
we have to solve the system dy/ 2 M /dMj = 0, which, using 
the fact that C is symmetric, is written as 

X T CffpXM = X T C„VG" F . (B 8 ) 

This linear system can be solved if only moments of 
low order are taken into account. Otherwise the ma¬ 
trix X t C^ 1 f X becomes too ill-conditioned. This is of no 
consequence in practice however since only the informa¬ 
tion on a few moments is present in noisy data. To find 
the onset u>l, the fit is done repeatedly, while sweeping 
the starting frequency from small to high until the result 
of the fit is stable. 

We also need the covariance for the moments. It can 
be found from the covariance of the Green functions 

C HF = <(G^ f - (G ff ))(G^ f - <G FF )) T ) . (B9) 

By multiplying the above by X T C/^ F from the left and 
by the transpose of that same matrix from the right and 
then using (B 8 ) to relate M to G FF we obtain 

C M = <(M - <M))(M - (M)) T ) = (X^C-^X)" 1 . 

(BIO) 


2. From G(r) 


Using the spectral form 


f duj e~ ulT A(uj) 

J 2n 1 ± e~P w ’ 


(Bll) 


where + is for fermions and — is for bosons, we easily 
obtain the moments 


M 0 = - [G(0) ± Gm , 
Mi = [G'(0) ± G'm , 


Mj = (-i y +1 


G u \0)±G u \/3) , 


(B12) 


where ± has the same correspondence with fermions and 
bosons as in (Bll), and where we defined 


G (U( t ') 


<P'G(t) 


dri 


(B13) 


Now, when r < 1/W, the exponential in the integrand 
of (Bll) can be expanded around r = 0, and similarly 
around r = /3 when /3 — r < 1/W since the Taylor series 
of G(t) converges in those regions of r. For r < 1 /W, 
we thus have 


and for /3 — r < 1/W, 


G(r) = G(/3) - G\m ~ r) + - r ) 2 


G< 3 ) (p) 


(/3-r ) 3 + ... (B15) 


or 


G(/?-t)=G(/?)-G'(/3)t + 


G ( 2 ) (/ 3) 2 G( 3 )(/ 3) 


6 


T° -(-... 


Now, adding and subtracting (B14) and (B16) gives 


(B16) 


G(r) + G(f3 - r) = [G(0) + G(/3)] + [G'(0) - G'm r 


2 L 


G ( 2 ) (0) + G ( 2 ) (/3) 


r 2 + 


6 L 


G ( 3 ) (0)-G ( 3 ) (/3) 


(B17) 


and 


G(r) - G(/3 - r) = [G(0) - G(/3)] + [G'(0) + G'm r 
~ [G ( 2 ) ( 0 ) - G ( 2 ) (/3)] r 2 +* [g ( 3 ) ( 0 ) + G ( 3 ) (/3) 


(B18) 


From (B12) we see that the moments are obtained from 
(n!), where n is a moment order, times the coefficients of 
the polynomial fits to G(r)+G(/3—r) and G(r) — G(/3—r) 
in the range r < 1/W. For fermions, we use the even 
powers of r in the fit to G(r) + G(/3 — r), and the odd 
powers in the fit to G(t) — G(/3 — r). For bosons, it is 
simply the opposite. 

As in the fit of the asymptotic form (B4) to G(iw n ), a 
weighted least-squares fit can also be used here. However, 
the procedure is slightly heavier because both the order of 
the polynomial and the number of points must be varied 
to ensure a stable result is found. If there are enough 
G(ri) with Tj < 1/W, accurate stationary values of the 
moments are obtained. 

For the least square fit we need the covariances of 
G(r)+G(/3-r) and G(r)-G(/3-r). For G(r)+G(/3-r), 

G+ = <[AG(tO + AG(/? - 7-0] [AG(tO + AG(/3 - r,)]> 

- (AG(tOAG(tO) + (AG(rj)AG(/3 - Tj)) 

+ (AG(/3 - tOAG(tO) + (AG(/3 - n )AG(0 - r 3 )) 

(B19) 


and for G(r) — G(/3 — r), 

C~j = <[AG(rO - AG(/3 - r,)] [AGfo) - AG(/3 - r,)]) 

= (AG(ri)AG(rO) - (AG(t z )AG(( 3 - Tj )) 

- (AG(13 - r<)AG(rO> + (A G(/3 - n)AG(/3 - Tj )) 

(B20) 


G(t) = G(0) + G'(0)r 


G^ (0) _ 2 | G( 3 )(0) r 3 | ^ 


(B14) 


where A G(t/) = G(r*) — (G(ri)). Now G(/3) is related 
to G(0), so that elements containing AG(/3) can be ex¬ 
pressed using AG(0). 


2 


6 





















For fermions, we have 


becomes, after integration by parts three times, 


G ab (P) = ~({A,B}) - G AB (0+), 
and for bosons, 

Gab(0) = {[A,B])+G ab ( 0 + ). 


(B21) 


(B22) 


G(iw„) = - 


G(0) ± G((3) , S[(0) ± S' N (p) 


lOJ n 


( iu n y 


5"(0) ± S^{p) 1 - e iUn -P/ N 


(iu] n ) 3 


(iU] n ) 4 


N -1 

£< 

J=0 


3) 

3 i+i 


(C4) 


Since the (anti-)commutators are canceled when (G(/3)) 
is subtracted from G(/J), we obtain, for Tj ^ ft, 

(AG(/3)AG(tj-)) = - (AG(O)AG(tj)) (B23) 

for fermions and 

(AG(P)AG(tj)) = (AG(O)AG(rj-)) (B24) 

for bosons. 


Appendix C: Computing G{iuJn) from G(t) 

To obtain G(iw n ) from a G(r) known only on a discrete 
set of points, we need an interpolating method. Simply 
using a discrete Fourier transform would produce a peri¬ 
odic function in w n , while Im[G(*w ra )] must decrease like 
1/ujn and Re[G(iw n )] like 1/w^ at high u n . We can use 
a cubic spline as the interpolating method. This has the 
advantages of recovering automatically the asymptotic 
behavior for the Fourier transform and of giving good 
accuracy in the whole frequency range. [30, 3‘. 

Suppose we have G(rj), for i = 0... TV, we need 4 N 
equations to define the spline. If 5j(r) is the cubic poly¬ 
nomial in the i th interval, the equations 

Si(n-i) = G(n- 1 ), 

Si(Ti) = G(Ti) , 

^(r i _ 1 ) = ^_ 1 (r i _ 1 ) > 1 J 

S"{Ti- i) = 5"_ 1 (r i _i), 


where Sj is the third derivative of Sj (r) and we assume 
that the imaginary time step is constant. Using (B12) 
and (C2), this expression can also be written as 


nc \ — M ° _l 
GyiLOn) — -h 

lUJ n 


Ml 

(w „) 2 


M 2 

(iw „) 3 


X _ e iu> n P/N 
(*Wn ) 4 


N— 1 

£< 

3=0 


nT i S 


(3) 


J + l 


(C5) 


The sum in the last term is a discrete Fourier trans¬ 
form that can be computed with a fast Fourier transform 
(FFT) routine. Using a cubic spline as the interpolation 
function therefore allows to set the norm and the first two 
moments of the Fourier transform to the correct values. 
It is also very efficient because we end up with a FFT. 

We also want to obtain the covariance matrix of G(iuj n ) 
from the one for G(r), which is assumed to be provided 
as input data. We start with 

C^T,) = ([G(t0 - G(n)] [G(tj) - G(t,-)] ) , (C6) 

where G(t{) is the expected value of G{ji). In this case 
we replace the continuous Fourier transform by a discrete 
one. We then have 

Grr(uji ,uj m ) = 

(Re [G(iuji) - G(iu>i)\ Re [G(iw m ) - G{iu m j\) 

= El COS (uiTj) COS (u m Tj)C(r u Tj) , 

' ' in 

(C7) 


for * = 1... IV, give 4 N — 2 equations. To obtain the last 
two equations, we can use the first moment M\ and the 
second M 2 of the spectral function. Using the definitions 
(B12), we have 


S[(0) ± S' N (p) = M, 

S"(0) ± S" (/3) = -M 2 . 


(Re [G(iwi) - G{iui)\ Im [G(iw m ) - G(iu m )]) 
= Y. COS (bJlTj) sin(q } m Tj)C{Tj, Tj ), 

' ' in 

(C 8 ) 


where the + sign is for fermions and the — for bosons. 
Knowing the spline, the Fourier transform, 


fP . 

G(ioj n )= dr e lu>nT G(r) 

Jo 


and 

Ci i (wi,w m ) = 

(Im [G(iuii) - G(iu>i)] Im [G(iui m ) - G(iw m )]) 


( PX 
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Those matrices can also be obtained more efficiently by 
first calculating 

C(Ti,U> m ) = (f) E Tj) , (CIO) 

then 

C R (ui,u m ) = (4) E e< " ,TlRe [ C '( Ti ’ w "*)] ( Cn ) 

^ ' i 

and 

Ci{ui,w m ) = (4) E e< " ,T * Im [ Cr ( T <> w "*)] ’ ( C12 ) 

' ' i 

so that 

CRR(ui,Wm) = Re [Cr(wi, u) m )\ , 

Cri(ui, uo m ) = Re [Ci(ui, Lo m )\ , (C13) 

Cu(ui,U)m) =Im[C;(w 1 ,W m )] . 

Appendix D: Estimating the width and weight of a 
quasiparticle peak 


where Mj P is the j th quasiparticle peak moment. 

On the other hand, the contribution to G from the 
incoherent part of the spectrum is 


Ginci^n ) 



duo A inc (uj) 
2n iu> n — LO 



dco A^ nc (cc) 
27T lLO n — LO 



duo A inc (u}) 

27r lLO n — LO 

(D3) 


Now, for io n < Wi nc the kernel can be expanded in powers 
of u> n /uo, 


Gindin) — 

f~ Winc du_ A inc (u) 

J- oo 27r w 

f duj Aj nc (cu) 


>Wi, 


2tt 


1 i 

LO 


i _)_ l ddL 

LO 


IU r 
LO 


(D4) 


which gives 


If the spectrum has a well defined peak around io = 0, 
it is possible, under two conditions, to estimate its width 
and weight. Those conditions are 

1. There is a frequency interval W qp < |w| < Wi„ c 
where A(co) « 0, such that the contribution to 
G(ibo n ) from this part of the spectrum is negligi¬ 
ble. 


2. A sufficient number of Matsubara frequencies sat¬ 
isfying the condition W qp < io n < Wi nc exists, such 
that G(ico n ) is represented by a unique Laurent se¬ 
ries in that interval. 


The treatment is however slightly different for fermions 
and bosons. In the latter case, what we call the spectrum 
is in fact A(lo)/lo. Let us start with the fermionic case. 

If the spectrum is made of a well defined peak around 
zero frequency and a finite frequency part that is usually 
incoherent, the Green function can be written as 

dco A qp (co) + f°° Ajnc{u) 

2ir iuo n - lo J _ 00 27 r iio n - lo (Dl) 
= Gqpi'i'LJn') + Gi nc {iU0 n ') • 



Then, since A qp {\w\ > W qp ) ~ 0, for frequencies Lo n > 
W qp , a large uo n expansion can be used for the kernel in 

Gqp{lLOn ), 




Mg p 

lLO n 


dLO Aqpiio) 
2n ico n — lo 


doo 

2n 


A qp (L0) 


M\ p 

(iuo n ) 2 


+ 



lL0 n 

M f 
(iLOnf 


LO 

( iu n ) 2 


+ . . . , 


+ 



(D2) 


G inc (iuo n ) = -M™ c - M% c (iL0n) - M™ c {iL0 n ) 2 + ... , 

(D5) 

where the M* nc s are inverse moments of Ai nc (uo). 

We finally obtain the following Laurent series form for 

W^qp <C LO n <C W^inci 


G LS {iu n ) «... - M l _ n 3 c (iL0 n ) 2 - M l _ n 2 c (iu0n) - ATT 


M 0 gp Mf p 
iuj n 


(iio n y 


+ ... (D 6 ) 


By fitting this form to G(ico n ) in the frequency range 
Wqp < uo n < W inc using the same method as the one 
used to fit the moments in Appendix B 2, we can ex¬ 
tract the moments of the quasiparticle peak. The weight 
of the peak will then be given by Mq P , its position, by 
Mf p /M q P , and its width by a/M| p /Mq P — (M^ p /Mq P ) 2 . 

There is no way however to be sure that the conditions 
1 and 2 are fulfilled by looking at G{iuo n ) only, without 
trying to fit the form (D 6 ). When the weight Mq P of the 
peak is large enough however, the imaginary part will 
seem to diverge at small uo n . Therefore we know if there 
is a peak, but we cannot know from the shape of G(iL 0 n ) 
if it is isolated enough from the rest of the spectrum so 
that the above procedure will work. 

In addition, if conditions 1 and 2 are satisfied, the op¬ 
timal numbers N qp and N inc of moments Mj p and M“ c , 
respectively, and the frequency range [ W qp , Wi nc ] satisfy¬ 
ing conditions 1 and 2 are not known in advance. Those 
parameters are to be determined during the fitting pro¬ 
cedure. In our code, the fit is done repeatedly by vary¬ 
ing those parameters, and the optimal values are deter¬ 
mined by a stationary point. If the fit is attempted and 
no stationary point is found, then conditions 1 and 2 
are not satisfied. In particular, if there are not enough 
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Matsubara frequencies satisfying W qp < uj n < Wi nc , the 
stationary character becomes ill-defined, and no unique 
result can be found, hence the importance of condition 
2 . 

For bosons, we consider the positive function A(w) = 
A(uj)/uj as the spectrum. If conditions 1 and 2 are satis¬ 
fied, the spectral form of G thus becomes 

dui wA w (w) I" 00 did ujA inc (id) 

2ir iuj n - id J-cc 2ir iid n - id (D7) 
= GqpilUJn) -}- Gi nc {iid n ) . 

From that expression, we deduce that the order of the 
moment associated with a given power of iid n in the 
Laurent series is increased by one with respect to the 
fermionic case. We thus obtain directly 



Appendix E: Defining the kernel matrix 


We want to approximate the form 

did 


G( "” ) = L to 


lOJ n — UJ 


(El) 


when A{u>) is known only on a discrete set of frequencies 
uij. To do that we use a cubic spline model for A{id) and 
integrate analytically (El). However, instead of using 
the same form for the polynomials Sj{id) in the whole 
frequency range, we take 

Sj(oj) = aj(oj — u>j) 3 +bj(od — idj) 2 + Cj(cj — uij) + dj (E2) 


for udi < id < io r and 


C£s(*w n ) ~ 


Note that we still use the qp notation but the low 
frequency-peak in the case of the conductivity, for ex¬ 
ample, would be a Drude peak. 

In that case, we do not obtain M$ p from the fit since 
for the qp part we are in the large ui n limit. However, 
since 


.. - M™ c {iid n y - M™(iu n ) - M™ 


iid n 


M™ 

(*w n ) 2 


+ ... (D8) 


/ OO 7 

tt-A qp (id), (D9) 

-OO 

we have 

Mg p = —G(iu) n = 0) - M™ c , (DIO) 

and we have all the parameters necessary to compute the 
peak position, width and weight. 


Sj{u) = a'j(u — Uj) 3 + bj{ u — Uj) 2 + c'j { u — Uj) + dj (E3) 
for id < u>i and w > ui r , 


UJ — UJQI 


id < id i 
id > id r . 


(E4) 


where w; and id r are defined in Appendix H. We use id as 
the integration variable in the central region and u on the 
left- and right-hand side regions. Thus, (El) becomes 


G{iid n ) 


1 r Ul du A(u) 

27r J Q u 2 iidn - 1 /u - idol 


J_ r did A{id) 1 r° du A(u) 

2nJ 2tt iid n — id 2tt J u 2 iid n - l/u - id 0r ’ 

(E5) 


where ui = 1 /(w; — idgi) and u r = 1 /{id r — u>o r )- Let us 
assume for now that the coefficients in (E2) and (E3) are 
known. On the left-hand side we have 


G i {iid n ) 




r u i +1 ajU 3 + bjU 2 + CjU + dj 
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iidn -idol {iidn -idol ) 2 
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Udn (aJqI 


+ ( - ln(u) + In [1 - u{iid n - w 0 ;)] ) 


U j +1 
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(E6) 
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where L is the number of intervals on the left and a,j , bj , Cj and dj are the coefficients obtained after expanding (E3) . 
In the center, we obtain 


1 


L+M 


G c (iw n ) — — 

= — V 

2 t r ^ 

= -V 

9ir 


f u,, ’ +1 dui aj(io — iOj) 3 + bj(uj — ojj) 2 + Cj(u> — utj) + dj 
2n 


iuj n — u> 


j — L+1 

L + M ru}j + i—u)j ^ a 3 _|_ ^ w 2 _|_ c _|_ j . 


j—L + 1 

M r 


10 


2n iuj n — uij — u> 


i =i 


w 2 w 3 

-(*w n - Wj) 2 a; - (iu>n - uij) — --- {iui n - uij) 3 ln[-iw„ + Wj + w] 


Ul j + 1 -LJj 


— {iujn — 0 Jj)u> ---( iui n — uij ) 2 ln[— iui n + Wj + u\ 


W i+1- W i 

J 0 


(E7) 


— cu — (iu} n — uij) ln[— iui n + ujj + u\ 


6c>j-)-i Uij 


Ci 


— ln[— iui n + uij + w] 


Ulj + 1 —Ulj 


dj , 


where M is the number of intervals in the central region. 
Between the first and the second line, we have made the 
change of variable (w — uij) —* ui in each interval. For 
the right-hand side, the expression for G r is the same 
as (E 6 ), except that uiqi is replaced with w 0r and the 
sum goes from L + M + 1 to L + M + N, where N is 
the number of intervals in that region. Then, we have 
G(iui n ) = Gi(iui n ) + G c (iuj n ) + G r (iui n ) and, forming a 
vector r with the coefficients aj , bj, Cj and dj of all the 
intervals, we obtain, in matrix form, 

G = K.r (E 8 ) 

where /C is the matrix obtained from expressions (E 6 ), 
(E7) and the expression similar to (E 6 ) for the right-hand 
side of the grid. 

The coefficients are the solution to the equations 


which correspond to 

lim u 2 dA(ui)/du = 0, 

, (Ell) 

lim u 2 dA(ui)/duj = 0 . 

to — y —oo 

The linear system that gives the spline coefficients in 
terms of the spectral weight has the form BY = TA, 
where TA is a vector with elements that are equal either 
to values of A or to zero. Finally, 

G = KA , (E12) 

where 

K = KB~ 1 T. (E13) 

is the kernel matrix. 


Sj{uj) = Afa) 

Sj(ujj + i) = A(ujj + 1) 

Sj(. u j+i) = Sj+i&j+i) (E9) 

Sj( u} j+ 1) = ^+i( w i+i) 

j = 1, ■ • ■ ,A/\ 

which provides 4J\f — 2 equations, where A f = L + M + N 
is the total number of intervals. The last two equations 
can be taken as 


dA{u) 


du 

dA(u) 


i=0+ 


= 0 . 


du 


u— 0“ 


= 0 , 


(E10) 


Appendix F: Minimizing |X 2 - aS 

We want to minimize 

Y 2 

Q = Y~ aS 

= ±(G-KA) T C~ 1 (G-KA) (pi) 

+ aV A UiA(ui) In ■ 

This form of \ 2 requires 0(N 2 ) operations to compute, 
which is not optimal for numerical calculation. Instead, 
if we diagonalize the covariance C to obtain 

C = U t CU, (F2) 
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we can rewrite x 2 as 


Then, (F6) can be written as 


X 2 = (G-KA) T (G-KA), 


where 


(F3) 


K t K + OjAcuA 


SAj = K t (G — KAj_i) 


— atj (Aw In (D 1 A J _i) + Aw) 


(F8) 


G = Vc^UtG, 

_ (F4) 

K = VC-n^K, 


and x 2 is computed in 0{N u]n ) operations instead. Thus, 


Q = ~(G-KA) t (G-KA) 


A 4/ ,, A(ui) ( F5 ) 

+ «T Awi^(wi) In ——r , 

L)\Ui) 


is the function to minimize with respect to elements of 
A. This is done by solving VQ = 0, namely 

— K r (G - KA) + a (Aw In (D _1 A) + Aw) = 0 , (F6) 


where Aw and D are the diagonal matrices with Aw and 
D as their diagonal, respectively. 

Now, suppose we know the spectrum Aj -i at Oj-i and 
want to obtain the spectrum Aj at ay such that Aj differs 
only slightly from Aj- 1 . We can write Aj = Aj- 1 + SAj 
and we have 


in ( »in ( • 

V D(ui) ) \ D(wi) ) Aj—i (wj) 


(F7) 


where Aj_i is the diagonal matrix made from Aj-i. This 
equation can be solved iteratively to obtain Aj, corre¬ 
sponding to aj, starting from Aj_i, corresponding to 
aj- 1 - This means that, after the first iteration, Aj-\ in 
Eq. (F8) is replaced with AJ + SAJ obtained at the n th 
iteration. The iterations stop when SAj is vanishingly 
small. 

However, even though the original expression for the 
entropy part prevents A from being negative, the solu¬ 
tion to (F8) can produce negative values because of the 
approximation (F7). Since, in any case, the argument of 
the log on the right-hand side of (F8) cannot be smaller 
than the smallest representable floating point number 
fmin on the computer, one solution to that problem is to 
smoothly continue the log by a quadratic function when¬ 
ever Aj(u)j)/D(u>j) < fmin- Although it will not com¬ 
pletely prevent negative values of A(wj) from appearing, 
the parameters of the quadratic function can be chosen 
to strongly penalize those values, so that only rare nega¬ 
tive values of very small magnitude will appear in regions 
where A(wj) should vanish, values that can thus be con¬ 
sidered indeed as vanishing. Therefore, we replace the 
entropy part in (F5) with 

£ = (F9) 


where 


f-AwjA(wj)ln A(w») > A min (wj), 

|_('c2^’i)[ j4 (w.) - A min (wi)] 2 + ci(wj)[A(wj) - A min (wj)] + c 0 (wi)^ A(w*) < A min (w, ; ), 

with A min (wi) = and thus 

dSi ^ f ^AWj In ^ D(o;,)) A A(w^) > Amin(^i) 5 

dA ( u i) ,J |-^c 2 (o; i )[A(w i ) - A mm (w*)] + ci(wj)^ A(w;) < A min (w;). 


(F10) 


(Fll) 


The matching of the derivatives at A(wi) = A m i n (wi) 
gives ci(wj) = Awj(ln(/ m j„) + 1). As for c 2 (wi), it must 
be large enough to avoid negative values of A, but not 
too large, otherwise it would degrade the system’s con¬ 
ditioning. 

Using (Fll), the system (F8) becomes 


K 1 K + MJ 


SAj =K T (G-KAj- 1 )+a j S' i _ 1 , (F12) 


where 

(F13) 

and Sj-j = VSj-i is the gradient of the entropy (F9) 
evaluated at Aj_i. 

Because the elements in Mj can differ by several orders 
of magnitude, the system (FI2) seems ill-conditioned. 
This problem can however be overcome by a precondi- 


= “A 



A(w;) > A n 
A(w;) < A„ 
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tioning step. If we write the right-hand side as Bj , (F12) 
can be written as 


(Mf) K t K J(Mf) +1 


^MfSA 

= A"/) A 


J > 


(F14) 


where / is the identity matrix. Then, we use a singular 
value decomposition (SVD) to write 


K\/(M/) 1 = VjKjVj , (F15) 


where Uj and V, are orthogonal matrices and Hj is di¬ 
agonal, so that (F14) can be written 


[ K JKj + 1] Vj yfitfSAj = Vj yj{MfY'Bj . (F16) 


Finally, the solution 


For the above method to work, the initial value of a 
must be chosen such that the entropy term dominates. 
Using the SVD for the default-model region 

KVe-iAw"^ = U dkdVd , (F18) 

we use ciinit = 100 max^t^nr))- This ensures that the so¬ 
lution is very close to e~ 1 D when the computation starts, 
and that ainit is not too high, which would uselessly make 
the computation longer. 

We know that the computation is over when 
d log x 2 /d log a is very small compared to its typical value 
in the information-fitting region. The condition we use 
to stop the computation is when d log y 2 /d log a becomes 
smaller than a certain fraction, say 0.01, of its maximum 
value. 


Appendix G: Recursive solution for the spectrum A 
in the limit of small changes in a 


Mi = \ 'v, Kj + 1] 1 Vjv (.'/;) 

(F17) 

is easy to obtain since [kJK j + i] is diagonal. 

The value of Aj is obtained by iterating (F17) un¬ 
til SAj « 0. The convergence criteria we use is 
Aui\SAj\ < toUA, where tol^ = 10~ 12 by default. 

Assuming that N Un < N^, where N Un is the number 
of Matsubara frequencies and N^, the number real fre¬ 
quencies, the SVD solves the system (F14) in 0(N^ lri N u] ) 
operations. This is more efficient than a more direct 
method, like Gauss elimination, that would take 0(N%) 
operations. Having the condition N Un < N u is prefer¬ 
able in general, to ensure that there are enough degrees 
of freedom in the spectrum A to capture all the struc¬ 
tures contained in the Green function G. If N u < N Uri , 
then the SVD computes the solution in 0{N UJn N 2 ). 


In this appendix we derive Eq. (23) which is useful to 
understand the changes of y 2 in the noise-fitting region. 
This expression is not used in the code itself. Defining 

a.j = aj-1 — A aj , (Gl) 

expression (F8) can be rewritten as 


K J K + a,AuAl 


SAj = K t (G — KAj_i) 


— aj-i (AwIn (D 1 Aj-i) + Aw) 

+ A aj (Aw In (D -1 A,_i) + Aw) . (G2) 


Now, if Aj _i is the spectrum that minimizes (FI) when 
a = aj- 1 , from (F6), the first two terms on the right- 
hand side cancel each other and we obtain 


SAj = Aay 


K t K + ajAcoA-\ 


1 “I 


(Awln(D 1 A J _ 1 ) + Aw) . 


(G3) 


Hence, in the limit of small Aa, the variation in A does not depend explicitly on the input Green function, and the 
spectrum at aj is given by 


Aj — Aj_i -f Attn 


K t K + Oj-AwAt^j 


i -i 


(Awln(D 1 Aj_i) + Aw) . 


(G4) 


The usefulness of this result is that it shows that the 
variation 8A is a smooth function of w if Aw, D and the 
current spectrum A are themselves smooth. Therefore A 
will have the tendency to remain a smooth function when 
a decreases, even in the overfitting regime. This behavior 
in the evolution of A as a function of a is very useful in 
practice. Indeed, on one hand, a smooth spectrum is 
generally desirable and, on the other hand, because it 


prevents fitting the noise easily, it provides an easy and 
efficient method for choosing the optimal value of a, as 
discussed in Sec. IIIB. 
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Appendix H: Non-uniform real frequency grid 


Let us define three regions for the grid: from — oo to wz, 
from wz to w r and from w r to oo. Then, let us assume that 
the step in the central region varies smoothly between 
Awz and Aw r . Now for w < w; we define 


u = - (HI) 

10 - w 0 z 

and, for w > w r , 

u= - 1 -. (H2) 

LU — LOQr 


Let us consider the left region. Assuming a constant step 
Azz, and that uj m in is the first finite frequency of the grid 
on the left, we want to determine woz, and the number of 
values of zt, N u , such that Aw is equal on the left and on 
the right side of w;. Those conditions give the step 


Azt = 


1 

Oil — W 0 Z 


1 

W; — Awj — Wqz 


_Aw;_ 

(wz - Awz - Woz) (wz - Woz) ’ 


(H3) 


and the grid 


u = Azz, 2Azz,..., N u Au , (H4) 

so that, given that the first finite value of u is Azz, 


1 


^min — i ^01 


(wz - Awz - ajpz) (wz - Wp;) 
Aw; 


(H5) 


+ Woz 


and thus, 


Woz = wz + \J Aw; (w; - w min ), (H6) 

which is the solution such that zz stays finite for all w < 
wz- For now, Woz is a temporary value. For N u , from 
(H4), the definition of u (HI), and the fact that Aw is 
equal on each side of wz, we have 

N u Au = --, (H7) 

wz - Aw ; - w 0 z 

which gives, using (H3) for Azt, 

N u = W01 ~ Ul . (H8) 

A w; 

Now, if we substitute (H6) in that expression, N u in gen¬ 
eral is not an integer. Therefore, we define N u as 

iv “ = “ il (^)' (H9) 

We then redefine w 0 ; as 

wqz = wz + N u Aloi , (H10) 


20 

15 

3 
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FIG. 6. Example of frequency grid density 1/Aw as a function 
of w around the central part of the grid. In this example, the 
user has provided parameters defining a grid with different 
constant frequency steps in different intervals in the form of 
Eq. (26). The program matches smoothly the steps in the 
different regions using a hyperbolic tangent shape. Then the 
distance Azt = l/(wj+i — w M ) — l/(wz — w^), (with w^ fixed 
by the cutoff), is constant outside the central interval [—5, 5]. 


and ijJmin as well using (H5). The procedure is the same 
for the right region of the grid. 

Figure 6 shows 1/Aw as a function of w around the 
central region for such a grid. The central region is also 
non-uniform, with a step that varies smoothly between 
the subregions of constant step defined by the user in the 
form given by Eq. (26). In practice it is recommended 
that the ratio between step sizes of consecutive frequency 
ranges does not exceed a factor of four. Large ratios can 
lead to spurious oscillations in the spectral weight. 

This is the natural grid corresponding to the spline 
used to model A(w), and described in Appendix E. It 
keeps the number of points very low while keeping the 
integrals of A{ w) very accurate for smooth spectra. Us¬ 
ing a continuous step is also necessary to avoid spurious 
oscillations in the results. 


Appendix I: Non-uniform Matsubara frequency grid 


Let us assume an initial number of Matsubara frequen¬ 
cies A/) + 1, where Nq = 2 r with r an integer. A non- 
uniform Matsubara frequency grid can be generated as 
follows. Define 

N\ = : Ni +1 is the number of adjacent frequencies 

close to the first frequency zwo, where m is an integer that 
satisfies 0 < m < r, 

N 2 = 4r : number of frequencies in each subinterval 
with a fixed spacing between Matsubara frequencies, and 

N = Ni + 1 + TOA 2 : total number of frequencies. 

Then, the Matsubara frequencies ioo n (j) are defined us- 
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FIG. 7. Example of non-uniform Matsubara frequency index 
grid. The vertical axis is the index of the Matsubara frequency 
and the horizontal axis numbers the frequencies that are kept. 


mg 


j = 0,... ,Ni — 1, 


n(j) = 


2 l i +1 mod(j - Nx, N 2 ) + N x 2 l i , 


where lj = floor , j = N lf ..., N - 1, 

(II) 

Figure 7 shows n(j) for Nq = 1024 and Ni = 16, which 
gives a total of 65 frequencies. 

The above non-uniform grid is used if N Uri , the number 
of Matsubara frequencies in the data, exceeds a maxi¬ 
mum default value N d chosen typically between a few 
hundreds and a thousand. This default value can be 
modified by the user. To generate the non-uniform grid, 
we first choose r such that ( N 0 = 2 r ) > 7V Wri — 1 is sat¬ 
isfied. For a given to, if A r 0 + 1 is strictly larger than 
N Uri then the grid must be truncated to obtain a maxi¬ 
mum frequency ito max smaller or equal to the maximum 
frequency available in the data -i- Given that pro¬ 

cedure, m is chosen such that the number of Matsubara 
frequencies does not exceed N d . 
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